Method and system for learning a same-material constraint in an image

ABSTRACT

In a first exemplary embodiment of the present invention, an automated, computerized method is provided for processing an image. According to a feature of the present invention, the method comprises the steps of providing an image file depicting an image, in a computer memory, assembling a feature vector for the image file, the feature vector containing information regarding a likelihood that a selected pair of regions of the image file are of a same intrinsic characteristic, providing a classifier derived from a computer learning technique, computing a classification score for the selected pair of regions of the image file, as a function of the feature vector and the classifier and classifying the regions as being of the same intrinsic characteristic, as a function of the classification score.

This is a continuation of U.S. patent application Ser. No. 12/584,910,filed Sep. 15, 2009 and hereby incorporated by reference herein.

BACKGROUND OF THE INVENTION

Many significant and commercially important uses of modern computertechnology relate to images. These include image processing, imageanalysis and computer vision applications. A challenge in theutilization of computers to accurately and correctly perform operationsrelating to images is the development of algorithms that truly reflectand represent physical phenomena occurring in the visual world. Forexample, the ability of a computer to correctly and accuratelydistinguish between a shadow and a material object within an image hasbeen a persistent challenge to scientists. Such an ability can beparticularly critical, for example, in computer vision applications, asmay be implemented in a robot or a security camera used to identifyobjects moving through a selected field of view. A computer must be ableto identify structures and features of a scene that can be modified inappearance due to overlying shadows. Cognitive processing of the humanbrain makes it possible for humans to automatically distinguish shadowfrom object. However, to a computer, it is all pixel values of varyingcolor characteristics. Accordingly, there is a persistent need for thedevelopment of accurate and correct techniques that can be utilized inthe operations of computers relating to images, to provide improvedimage appearance.

SUMMARY OF THE INVENTION

The present invention provides a method and system implementing imageprocessing techniques that utilize spatio-spectral information relevantto an image, to perform an operation to accurately and correctlyidentify and separate illumination and material aspects of the image.

In a first exemplary embodiment of the present invention, an automated,computerized method is provided for processing an image. According to afeature of the present invention, the method comprises the steps ofproviding an image file depicting an image, in a computer memory,assembling a feature vector for the image file, the feature vectorcontaining information regarding a likelihood that a selected pair ofregions of the image file are of a same intrinsic characteristic,providing a classifier derived from a computer learning technique,computing a classification score for the selected pair of regions of theimage file, as a function of the feature vector and the classifier andclassifying the regions as being of the same intrinsic characteristic,as a function of the classification score.

According to a further feature of the first exemplary embodiment of thepresent invention, the method comprises the further steps of defining aconstraint between the regions, when a classification score indicatesthe regions as being of the same intrinsic characteristic, andperforming an optimization operation as a function of the constraint togenerate an intrinsic image corresponding to the image. The intrinsicimage comprises an illumination image and/or a material image. As afurther feature of the first exemplary embodiment, the same intrinsiccharacteristic comprises a same-material for the regions.

In a second exemplary embodiment of the present invention, a computersystem is provided. The computer system comprises a CPU and a memorystoring an image file containing an image. According to a feature of thepresent invention, the CPU is arranged and configured to execute aroutine to assemble a feature vector for the image file, the featurevector containing information regarding a likelihood that a selectedpair of regions of the image file are of a same intrinsiccharacteristic, provide a classifier derived from a computer learningtechnique, compute a classification score for the selected pair ofregions of the image file, as a function of the feature vector and theclassifier and classify the regions as being of the same intrinsiccharacteristic, as a function of the classification score.

In a third exemplary embodiment of the present invention, a computerprogram product is provided. According to a feature of the presentinvention, the computer program product is disposed on a computerreadable media, and the product includes computer executable processsteps operable to control a computer to: assemble a feature vector foran image file, the feature vector containing information regarding alikelihood that a selected pair of regions of the image file are of asame intrinsic characteristic, provide a classifier derived from acomputer learning technique, compute a classification score for theselected pair of regions of the image file, as a function of the featurevector and the classifier and classify the regions as being of the sameintrinsic characteristic, as a function of the classification score.

In a fourth exemplary embodiment of the present invention, a computerprogram product is provided. According to a feature of the presentinvention, the computer program product is disposed on a computerreadable media, and the product includes computer executable processsteps operable to control a computer to: train a classifier for use in acomputer learning technique, as a function of feature vectors derivedfrom image files from a training set, and coded ground truth versions ofthe image files of the training set.

In a fifth exemplary embodiment of the present invention, an automated,computerized method is provided for processing an image. According to afeature of the present invention, the method comprises the steps ofproviding an image file depicting an image, in a computer memory,identifying a set of indicia measuring one of similarities anddissimilarities for selected pairs of regions of the image, transformingsimilarity and dissimilarity information derived from the set topairwise distances, performing a clustering operation as a function ofthe pairwise distances and defining same intrinsic characteristicconstraints as a function of clusters.

In a sixth exemplary embodiment of the present invention, a computerprogram product is provided. According to a feature of the presentinvention, the computer program product is disposed on a computerreadable media, and the product includes computer executable processsteps operable to control a computer to: provide an image file depictingan image, in a computer memory, identify a set of indicia measuring oneof similarities and dissimilarities for selected pairs of regions of theimage, transform similarity and dissimilarity information derived fromthe set to pairwise distances, perform a clustering operation as afunction of the pairwise distances and define same intrinsiccharacteristic constraints as a function of clusters.

In a seventh exemplary embodiment of the present invention, anautomated, computerized method is provided for processing an image.According to a feature of the present invention, the method comprisesthe steps of providing an image file depicting an image, in a computermemory, identifying a set of indicia measuring one of similarities anddissimilarities for each of a series of selected pairs of regions of theimage, selecting a pair of regions of the image from the series,performing a segregation of the image into intrinsic images, assuming asame intrinsic characteristic for the selected region, evaluating theintrinsic images according to preselected criteria, retaining the pairas being of the same intrinsic characteristic when the evaluationsatisfies the criteria and repeating the selecting, performing,evaluating and retaining steps for each pair of the series.

In an eighth exemplary embodiment of the present invention, a computerprogram product is provided. According to a feature of the presentinvention, the computer program product is disposed on a computerreadable media, and the product includes computer executable processsteps operable to control a computer to: provide an image file depictingan image, in a computer memory, identify a set of indicia measuring oneof similarities and dissimilarities for each of a series of selectedpairs of regions of the image, select a pair of regions of the imagefrom the series, perform a segregation of the image into intrinsicimages, assuming a same intrinsic characteristic for the selectedregion, evaluate the intrinsic images according to preselected criteria,retain the pair as being of the same intrinsic characteristic when theevaluation satisfies the criteria and repeat the selecting, performing,evaluating and retaining steps for each pair of the series.

In accordance with yet further embodiments of the present invention,computer systems are provided, which include one or more computersconfigured (e.g., programmed) to perform the methods described above. Inaccordance with other embodiments of the present invention, computerreadable media are provided which have stored thereon computerexecutable process steps operable to control a computer(s) to implementthe embodiments described above. The present invention contemplates acomputer readable media as any product that embodies information usablein a computer to execute the methods of the present invention, includinginstructions implemented as a hardware circuit, for example, as in anintegrated circuit chip. The automated, computerized methods can beperformed by a digital computer, analog computer, optical sensor, statemachine, sequencer, integrated chip or any device or apparatus that canbe designed or programmed to carry out the steps of the methods of thepresent invention.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a computer system arranged and configuredto perform operations related to images.

FIG. 2 shows an n×m pixel array image file for an image stored in thecomputer system of FIG. 1.

FIG. 3 is a flow chart showing a data flow in an automated computeroperation to identify illumination and material aspects of an image,according to a feature of the present invention.

FIG. 4 is a functional block diagram of an image segregation systemarchitecture, implemented in the computer system of FIG. 1, forperformance of the computer operation of FIG. 3, according to a featureof the present invention.

FIG. 5 shows a graphical user interface for use in connection with animplementation of the image segregation system architecture feature ofthe present invention.

FIG. 6 a is a flow chart for identifying Type C token regions in theimage file of FIG. 2 a, according to a feature of the present invention.

FIG. 6 b is an original image used as an example in the identificationof Type C tokens.

FIG. 6 c shows Type C token regions in the image of FIG. 6 b.

FIG. 6 d shows Type B tokens, generated from the Type C tokens of FIG. 6c, according to a feature of the present invention.

FIG. 7 is a flow chart for a routine to test Type C tokens identified bythe routine of the flow chart of FIG. 6 a, according to a feature of thepresent invention.

FIG. 8 is a flow chart for constructing Type B tokens via an arbitraryboundary removal technique, according to a feature of the presentinvention.

FIG. 9 is a flow chart for creating a token graph, containing token mapinformation, according to a feature of the present invention.

FIG. 10 is a flow chart for constructing Type B tokens via an adjacentplanar token merging technique, according to a feature of the presentinvention.

FIG. 11 is a flow chart for generating Type C tokens via a local tokenanalysis technique, according to a feature of the present invention.

FIG. 12 is a flow chart for constructing Type B tokens from Type Ctokens generated via the local token analysis technique of FIG. 11,according to a feature of the present invention.

FIG. 13 is a flow chart for training a classifier, in connection with acomputer learning technique for use in the identification of a sameintrinsic characteristic of an image, such as same-material regions ofan image, according to a feature of the present invention.

FIG. 14 is a flow chart for the use of the classifier, trained accordingto the flow chart of FIG. 13, to identify adjacent Type C tokens of thesame-material, as a basis for identification of a same-material regionof the image.

FIG. 15 is a flow chart for constructing a same-material constraintbased upon sparse clustering, according to a feature of the presentinvention.

FIG. 16 is a flow chart for a further method of clustering same-materialregions of an image, according to a feature of the present invention.

FIG. 17 is a representation of an [A][x]=[b] matrix relationshipaccording to a feature of the present invention.

FIG. 18 is a functional block diagram for a service provider componentfor use in the image segregation system architecture of FIG. 4.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Referring now to the drawings, and initially to FIG. 1, there is shown ablock diagram of a computer system 10 arranged and configured to performoperations related to images. A CPU 12 is coupled to a device such as,for example, a digital camera 14 via, for example, a USB port. Thedigital camera 14 operates to download images stored locally on thecamera 14, to the CPU 12. The CPU 12 stores the downloaded images in amemory 16 as image files 18. The image files 18 can be accessed by theCPU 12 for display on a monitor 20, or for print out on a printer 22.

Alternatively, the CPU 12 can be implemented as a microprocessorembedded in a device such as, for example, the digital camera 14 or arobot. The CPU 12 can also be equipped with a real time operating systemfor real time operations related to images, in connection with, forexample, a robotic operation or an interactive operation with a user.

As shown in FIG. 2, each image file 18 comprises an n×m pixel array.Each pixel, p, is a picture element corresponding to a discrete portionof the overall image. All of the pixels together define the imagerepresented by the image file 18. Each pixel comprises a digital valuecorresponding to a set of color bands, for example, red, green and bluecolor components (RGB) of the picture element. The present invention isapplicable to any multi-band image, where each band corresponds to apiece of the electro-magnetic spectrum. The pixel array includes n rowsof m columns each, starting with the pixel p (1,1) and ending with thepixel p(n, m). When displaying or printing an image, the CPU 12retrieves the corresponding image file 18 from the memory 16, andoperates the monitor 20 or printer 22, as the case may be, as a functionof the digital values of the pixels in the image file 18, as isgenerally known.

According to a feature of the present invention, in an image process,the CPU 12 operates to analyze and process information, for example, theRGB values of the pixels of an image stored in an image file 18, toachieve various objectives, such as, for example, a correct and accurateidentification of illumination and material aspects of the image. Thepresent invention provides a spatio-spectral operator/constraint/solvermodel to identify the illumination and material components of an image.

To that end, as shown in FIG. 3, in step 1000, an image file 18 is inputto the CPU 12. In step 1002, the CPU 12 performs an image segregation,according to features of the present invention, to generate intrinsicimages that correspond to the image depicted in the input image file 18.The intrinsic images include, for example, an illumination image, tocapture the intensity and color of light incident upon each point on thesurfaces depicted in the image, and a material image, to capturereflectance properties of surfaces depicted in the image (the percentageof each wavelength of light a surface reflects).

A fundamental observation underlying a basic discovery of the presentinvention, is that an image comprises two components, material andillumination. All changes in an image are caused by one or the other ofthese components. Spatio-spectral information is information relevant tocontiguous pixels of an image depicted in an image file 18, such asspectral relationships among contiguous pixels, in terms of color bands,for example RGB values of the pixels, and the spatial extent of thepixel spectral characteristics relevant to a characteristic of theimage, such as, for example, a single material depicted in the image orillumination flux effecting the image. When one of material andillumination is known in an image, the other can be readily deduced.

Moreover, the illumination flux includes an incident illuminant and anambient illuminant. The spectra for the incident illuminant and theambient illuminant can be different from one another. Thus, a spectralshift is caused by a shadow, i.e., a decrease of the intensity of theincident illuminant. The spectral shift can cause a variance in color ofmaterial depicted in the scene, from full shadow, through the shadowpenumbra, to fully lit. Pursuant to a feature of the present invention,spectral shift phenomena is captured in spatio-spectral information. Thespatio-spectral information includes a spectral ratio: a ratio basedupon a difference in color or intensities between two areas of a scenedepicted in an image, which may be caused by different materials (anobject edge), an illumination change (illumination boundary) or both.

According to a further feature of the present invention, spatio-spectraloperators are generated to facilitate the process for the segregation ofillumination and material aspects of a scene depicted in the retrievedimage file 18. Spatio-spectral operators comprise representations orcharacteristics of an image that encompass spatio-spectral informationusable in the processing of material and illumination aspects of animage. The spatio-spectral operators are subject to constraints thatdefine constraining spatio-spectral relationships between the operators,for input to a solver. The solver includes a mathematical processingengine that operates to obtain an optimized solution for the generationof an intrinsic image, such as a material image and/or an illuminationimage derived from the original image stored in the retrieved image file18, as a function of the constraining relationships between thespatio-spectral operators.

Spatio-spectral operators include, for example, tokens, token mapinformation, log chromaticity representation values, X-junctions, BIDRmodel representations, a boundary representation, and a texton histogrambased pixel representation.

Pursuant to a feature of the present invention, a token is a connectedregion of an image wherein the pixels of the region are related to oneanother in a manner relevant to identification of image features andcharacteristics such as identification of materials and illumination.The use of tokens recognizes the fact that a particular set ofmaterial/illumination/geometric characteristics of an image extendsbeyond a single pixel, and therefore, while the image processingdescribed herein can be done on a pixel level, tokens expedite a moreefficient processing of image properties. The pixels of a token can berelated in terms of either homogeneous factors, such as, for example,close correlation of color values among the pixels, or nonhomogeneousfactors, such as, for example, differing color values relatedgeometrically in a color space such as RGB space, commonly referred toas a texture. Exemplary embodiments of the present invention providemethods and systems to identify various types of homogeneous ornonhomogeneous tokens for improved processing of image files. Thepresent invention utilizes spatio-spectral information relevant tocontiguous pixels of an image depicted in an image file 18 to identifytoken regions.

According to one exemplary embodiment of the present invention,homogeneous tokens are each classified as either a Type A token, a TypeB token or a Type C token. A Type A token is a connected image regioncomprising contiguous pixels that represent the largest possible regionof the image encompassing a single material in the scene. A Type B tokenis a connected image region comprising contiguous pixels that representa region of the image encompassing a single material in the scene,though not necessarily the maximal region corresponding to thatmaterial. A Type C token comprises a connected image region of similarimage properties among the contiguous pixels of the token, for example,similar color and intensity, where similarity is defined with respect toa noise model for the imaging system used to record the image.

A linear token is a nonhomogeneous token comprising a connected regionof the image wherein adjacent pixels of the region have differing colormeasurement values that fall within a cylinder in RGB space, from a darkend (in shadow) to a bright end (lit end), along a positive slope. Thecylinder configuration is predicted by a bi-illuminant dichromaticreflection model (BIDR model), according to a feature of the presentinvention, when the color change is due to an illumination changeforming a shadow (i.e. a decrease in the intensity of the incidentilluminant as the interplay between the incident or direct illuminantand the ambient illuminant in the illumination field) over a singlematerial of a scene depicted in the image.

For purposes of describing, identifying and using linear tokens, theBIDR model can be stated as: I_((x, y, z, θ, φ, λ))=c_(b) (λ)l_(d)(λ)γ_(b)+Ma (λ)c_(b) (λ), where: I_((x, y, z, θ, φ, λ)) is the radianceof a surface point at (x, y, z) in the direction θ, φ for the wavelengthλ, c_(b) (λ) is the geometry independent body reflectance of a surfacefor the wavelength λ, l_(d) (λ) is the incident illuminant for thewavelength λ. γ_(b) is the product of a shadow factor s_(x, y, z) and ageometric factor m_(b) (θ_(i)), and Ma (λ) is the integral of theambient illuminant and geometric body reflectance over a hemisphere,excluding the incident illuminant. For more detailed information on theBIDR model, reference should be made to U.S. application Ser. No.11/341,751, filed Jan. 27, 2006, entitled: “Bi-illuminant DichromaticReflection Model For Image Manipulation,” published as US 2007/0176940on Aug. 2, 2007.

Token map information indicates locations of tokens within an image,relative to one another. The map information is used to identifyneighboring tokens for performing an analysis of token neighborrelationships relevant to constraining spatio-spectral relationshipsbetween tokens, for input to the solver.

Log chromaticity representation values provide illumination invariantvalues for pixels of the image. Logarithmic values of the color bandvalues of the image pixels are plotted on a log-color space graph. Thelogarithmic values are then projected to a log-chromaticity projectionplane oriented as a function of the BIDR model. The chromaticity planevalues are substituted for the color band values (for example, RGBvalues) of each pixel. For more detailed information on log chromaticityrepresentation values, reference should be made to U.S. application Ser.No. 11/403,719, filed Apr. 13, 2006, entitled: “Method And System ForSeparating Illumination And Reflectance Using a Log Color Space,”published as US 2007/0242878 on Oct. 18, 2007, and now issued as U.S.Pat. No. 7,596,266 on Sep. 29, 2009.

An X-junction is an area of an image where a material edge and anillumination boundary cross one another. An X-junction is an optimallocation for an accurate analysis of material and illumination aspectsof an image.

A boundary representation is an arrangement of pixels, on each side of aboundary, formed by, for example, adjacent Type B tokens. Thearrangement is used to facilitate an analysis of the boundary toclassify the boundary as a material boundary on a smooth surface (asopposed to another type of boundary, for example, an illumination edge,depth boundary or simultaneous illumination and material change). Thepixel representation is configured to provide samples of pixels withineach of the Type B tokens forming the boundary. The pixels of thesamples are subject to spatio-spectral analysis, and the results arecompared to determine the likelihood that the respective boundarycorresponds to a material change.

A texton is a homogeneous representation for a region of an image thatcomprises a texture. Image texture can be defined as a function ofspatial variation in pixel intensities. Image texture patterns arefrequently the result of physical or reflective properties of the imagesurface. Commonly, an image texture is associated with spatialhomogeneity and typically includes repeated structures, often with somerandom variation (e.g., random positions, orientations or colors). Imagetextures are also often characterized by certain visual properties suchas regularity, coarseness, contrast and directionality. An example ofimage texture is the image of a zebra skin surface as it appears to bespatially homogenous and seems to contain variations of colorintensities which form certain repeated patterns. Some image texturescan be defined by geometric characteristics, such as stripes or spots. Atexton based operator transforms patterns of differing reflectancecaused by a textured material into a homogeneous representation thatcaptures the spectral and spatial characteristics of the textured regionin the image.

Constraints comprise, for example, an anchor constraint, a sameillumination constraint, a smooth illumination constraint, a Type Btoken constraint, a Linear token constraint, a BIDR enforcementconstraint, a same texton histogram constraint, a log chromaticitysimilarity constraint, an X junction constraint, and a boundaryrepresentation constraint. Each constraint is configured as a constraintgenerator software module that defines the spatio-spectral operatorsutilized by the respective constraint and provides an expression of theconstraining relationship imposed upon the constituent operators.

An anchor constraint utilizes a number of brightest/largest Type Ctokens in an image. The constraining relationship is that the materialof the selected brightest/largest Type C tokens is constrained to be anabsolute value for the color/brightness observed in the image. Theconstraint anchors a material map for the image at an absolutebrightness to avoid relative brightness constraints.

A same illumination constraint utilizes Type C tokens and Type B tokensidentified in an image and token map information. The constrainingrelationship is that adjacent Type C tokens, as indicted by the tokenmap information, are at the same illumination, unless the adjacent TypeC tokens are part of the same Type B token. The term “same” inconnection with the term “illumination” is used to mean an average valuewith respect to a noise model for the imaging system used to record theimage. This constrains any observed differences in appearance betweenadjacent Type C tokens, that are not part of the same Type B token, tobe a material change, as will appear.

A smooth illumination constraint is similar to the same illuminationconstraint. However, rather than constraining all pixels of adjacentType C tokens to be of the same illumination, as in the sameillumination constraint, in the smooth illumination constraint, theconstraint is based upon the average illumination of the pixels near ashared boundary between adjacent Type C tokens. This constrains theillumination field to be somewhat smooth, as opposed to piecewiseconstant (the same, as defined above) throughout a token.

A Type B token constraint also utilizes Type C tokens and Type B tokens.However, the constraining relationship is that all Type C tokens thatare part of the same Type B token are constrained to be of the samematerial. This constraint enforces the definition of a Type B token,that is, a connected image region comprising contiguous pixels thatrepresent a region of the image encompassing a single material in thescene, though not necessarily the maximal region corresponding to thatmaterial. Thus, all Type C tokens that lie within the same Type B tokenare by the definition imposed upon Type B tokens, of the same material,though not necessarily of the same illumination. The Type C tokens aretherefore constrained to correspond to observed differences inappearance that are caused by varying illumination.

Accordingly, the Type B token constraint is complementary to the sameand smooth illumination constraints, which, as opposed to illuminationchange, constrain observed differences to correspond to material change,as described above. This is due to the fact that in each of the same andsmooth illumination constraints, Type C tokens that are adjacent and notpart of the same Type B token, are constrained to the same illumination.These Type C tokens should comprise different materials, since by theconstraint, they are not in the same Type B token and therefore, by thedefinition of Type B tokens enforced by the constraint, do not encompassa single material, so illumination should be a constant, and anyobserved difference is considered as attributable to a material change.

To summarize, pursuant to a feature of the present invention, the Type Cand Type B token spatio-spectral operators are defined to providecharacteristics of an image that enable segregation of illumination andmaterial. Type C tokens each comprise a connected image region ofsimilar image properties, for example similar color, as recorded andstored in an image file 18. Thus, adjacent Type C tokens indicate someform of change in the image or else they would form the same Type Ctoken. Type B tokens encompass a single material. The complementaryconstraints of the same/smooth illumination constraints and the Type Btoken constraint enforce relationships between the tokens that indicateeither a material change or an illumination change.

If the adjacent Type C tokens are within the same type B token, as inthe Type B token constraint, the differences between them shouldcorrespond to illumination change due to the same-material property ofthe common Type B token. If the adjacent Type C tokens are not withinthe same Type B token, as in the same/smooth illumination constraints,the difference between them should then correspond to a material changesince they are not both defined by a common, single material Type Btoken.

A Linear token constraint utilizes Type C tokens and Linear tokens. Theconstraining relationship is that a difference between two Type Ctokens, spaced by a Linear token, approximately equals a characteristicilluminant spectral ratio for the image. As defined, a Linear tokenfollows a cylinder configuration along a positive slope, through colorspace. The BIDR model predicts that the positive slope equals acharacteristic illuminant spectral ratio for the image. Thus, the colordifference between two Type C tokens, one at each of the dark end andbright end of a Linear token, should reflect the value of the respectivecharacteristic illuminant spectral ratio for the image.

A BIDR enforcement constraint utilizes Type C tokens and a BIDR modeldefined normal vector for the log-chromaticity projection plane. Theconstraining relationship is that the illumination for all Type C tokensin a local patch of the image forms a set of parallel lines in log-colorspace, the orientation of the parallel lines being defined by the BIDRmodel defined normal vector. The constraint therefore enforces theillumination field present in the image to explicitly fit the BIDR modelprediction for the illumination.

Thus, each of the Linear token constraint and the BIDR enforcementconstraint utilize BIDR model predictions as a basis to segregateillumination and material aspects of an image. The BIDR model predicts acolor change in an image when the color change is due to an illuminationchange forming a shadow (i.e. a decrease in the intensity of theincident illuminant as the interplay between the incident or directilluminant and the ambient illuminant in the illumination field) over asingle material of a scene depicted in the image. The color changeprediction of the BIDR model accurately constrains all color bandvariations among Type C tokens to illumination field effects occurringin an image by operating as a function of the interplay between thespectral variations occurring between incident illuminant and ambientilluminant components of the illumination field. Thus, BIDR model basedconstraints couple all color band variations into one integralconstraining relationship.

A same texton histogram constraint utilizes Type C tokens and textonhistogram operators identified for texture regions within an image. Atexton analysis is utilized wherein each pixel of the image (or pixelsof those regions of an image identified as comprising a texture) fromthe recorded color band representation of the respective image file 18,such as, for example, RGB color band values, is converted to a two bandrepresentation wherein the two bands comprise a texton label and atexton histogram label. The two band representations are then used toidentify texture tokens, as will be described below. A constraint can beimposed that all Type C tokens within the same texture token are of thesame mean material.

A log chromaticity similarity constraint utilizes Type C tokens and logchromaticity representation values. The constraining relationship isthat those Type C tokens having pixels with similar log chromaticityrepresentation values are constrained to a same color value, withobserved differences being attributed to variations in the illuminationfield.

An X-junction constraint utilizes Type C tokens and X-junctionoperators. As noted above, an X-junction is an area of an image where amaterial edge and an illumination boundary cross one another.X-junctions are typically identified by four Type C tokens, two pairs ofsame-material Type C tokens forming the material edge, with eachsame-material pair including an illumination boundary dividing therespective same material into lit and shadowed pairs of Type C tokens.The constraining relationship: 1) a Type B token constraint is imposedbetween each same-material pair of Type C tokens forming the X-junction(those with an illumination boundary between them), and 2) a sameillumination constraint is imposed between each pair of Type C tokensforming the material edge of the X-junction. For a more detaileddescription of X-junctions and the relationships of constituent tokens,reference should be made to U.S. application Ser. No. 11/341,742, filedJan. 27, 2006, entitled: “Method And System For Identifying IlluminationFlux In An Image,” published as US 2006/0177149 on Aug. 10, 2006, nowissued as U.S. Pat. No. 7,672,530 on Mar. 2, 2010.

A boundary representation constraint is defined by a standard ratioconstraint. An analysis performed on a boundary representation, whenindicating a material change, provides an estimate of the ratio ofcolors between two adjacent regions defined by the boundary, forexample, the adjacent Type B tokens, even when the illumination variesover the regions. The constraint states that the ratio of the colors oftwo adjacent regions is X. The boundary representation analysis isexecuted at the level of Type B tokens, to classify a boundary as beingcaused by a material change, then propagated down to the level of theconstituent Type C tokens. For a more detailed description of a boundaryanalysis, at the Type B token level, reference should be made to U.S.application Ser. No. 12/079,878, filed Mar. 28, 2008, entitled “Systemand Method For Illumination Invariant Image Segmentation”, published asU.S. 2009/0245680 A1 on Oct. 1, 2009, and issued as U.S. Pat. No.8,175,390 on May 8, 2012.

According to a feature of the present invention, the boundaryrepresentation constraint states that all adjacent pairs of Type Ctokens along the boundary, (one Type C token on each side of theboundary, and all of the Type C tokens being within the Type B tokensforming the respective boundary), have colors that satisfy the ratio X,as indicated by the boundary representation analysis.

According to a preferred embodiment of the present invention, each ofthe above described constraints can be classified into one of threebasic types of constraints, an absolute material color constraint, asame-material constraint and a relative reflectance constraint. Theabsolute material constraint constrains the material at a particularlocation of an image to be a certain color, as implemented in, forexample, the anchor constraint. The same-material constraint constrainsoperators relevant to an image (for example, two pixels or Type Ctokens) to be of the same material. The same-material type of constraintcan be implemented in, for example, Type B, X-junction, log chromaticitysimilarity, same texton histogram and linear token constraints. Therelative reflectance constraint constrains operators relevant to animage (for example, two pixels or Type C tokens) to have a similarity ofreflectance characteristics, such as defined by smooth illumination andsame illumination constraints, and which can be specified by X-junction,and boundary representation constraints.

An exemplary solver according to a feature of the present inventioncomprises a mathematical processing engine for executing an optimizingfunction, for example, optimization of results in an equation expressedby: [A][x]=[b], where [A] is a matrix of values that are to be satisfiedby (and therefore, taken as solved for by) the definitions of theoperator(s) and the constraining relationship(s) for the operator(s), asindicated by selected constraint(s), [x] is a matrix of variables forwhich the equation is finding an optimal solution, for example, one ofan illumination or material component of an image component, forexample, a pixel or token, and [b] is a matrix of values observed in animage selected for processing, for example, the recorded values for theRGB color bands of each pixel of an image file 18. The optimizingequation can be implemented in a mathematical optimizing functionselected from a set of known optimization solvers such as, for example,known convex optimization operations such as a least squares solver, ora preconditioned conjugate gradient solver.

According to a further feature of the present invention, factorsincluding bounds, are introduced in a solver operation, in addition toconstraining relationships, as a function of real world illumination andmaterial phenomena, to keep material/illumination values withinphysically plausible ranges, such as a limit 1, limit infinity solver(L₁, L_(∞)), a bounded least squares solver, or a bounded L₁, L_(∞)solver, as will be described below.

FIG. 4 shows a functional block diagram of an image segregation systemarchitecture, implemented in, for example, the computer system of FIG.1, according to a feature of the present invention. Alternatively, thefunctional blocks of FIG. 4 can be implemented in a dedicated hardwarecircuit arranged to perform the functionality of the blocks of FIG. 4.An image 32 (as depicted in an image file 18) is input to apreprocessing block 33. The preprocessing block 33 can perform suchfunctions as correction of chromatic aberration in the image 32,combining multiple images to provide a high dynamic range image,linearize pixel data for the image, and so on, for an image optimizedfor processing. The pre-processed image is then input to a Type Ctokenization block 35 which operates to identify Type C tokens in thepre-processed image, in the manner described below. Type C tokens arecommon to many of the constraints utilized in exemplary embodiments ofthe present invention, thus, an initial identification of Type C tokensfor an input image 32 expedites further processing.

In an exemplary embodiment of the present invention, the CPU 12 executescode to implement both the preprocessing block 33 and the Type Ctokenization block 35, as well as a service provider 24, that functionsas a central agent and caching structure (configured in the memory 16),to handle an image for processing according to the teachings of thepresent invention. The service provider 24 receives and stores thepre-processed image and related Type C token information from the Type Ctokenization block 35, and is coupled to an operators block 28 (executedby the CPU 12) arranged to generate any other operators for the imagerequired by selected constraints, as will appear. The service provider24 is also coupled to a global features extraction input 29. The globalfeatures extraction input 29 can be used to provide the system withinformation relevant to an image being processed, such as an indicationof light source when the image was taken (sunlight, fluorescent light,incandescent light), time of day, location, domain knowledge, such asinformation relevant to the nature of the image, such as interior,exterior, buildings, lawns with green grass, trees with leaves in bloom,etc., and any other parameters relevant to image processing. The serviceprovider 24 stores the global features extraction input 29 with arelated input image 32.

A constraint builder 26 is coupled to the service provider 24. Theconstraint builder 26 uses a constraint generator library (configuredwithin the memory 16) that stores the constraint generator softwaremodules for the various constraints described above. The serviceprovider 24 and constraint builder 26 operate to arrange spatio-spectraloperators relevant to the pre-processed image, according to selectedones of the constraint generator software modules, in for example, the[A][x]=[b] matrix equation.

A solver 30 (executed by the CPU 12) is coupled to the constraintbuilder 26, and implements an optimization operation, as describedabove, for an optimal solution for the [A][x]=[b] matrix equation, foruse in generating intrinsic images from the pre-processed image. Thesolver 30 is also coupled to a post-processing block 36 (executed by theCPU 12) for certain post-processing operations. The post-processingoperations can include, for example, monotonicity maintenance. Inmonotonicity maintenance, if two large regions exhibit a lineartransition in the input image 32, the transition should remain a lineartransition in the output intrinsic image 34. Post-processing can alsoinclude illumination propagation, that serves to fill in holes left bythe solver 30, illumination-map based white balancing and otherfiltering, smoothing processes. The post-processing block 36 outputsintrinsic images 34.

Referring now to FIG. 5, there is shown a graphical user interface (GUI)for use in connection with an exemplary implementation of the imagesegregation system architecture feature of the present invention. TheGUI of FIG. 5 is displayed on the monitor 20 of the computer system 10by the service provider 24 for a user to select a desired imagesegregation operation. The upper left hand corner of the GUI indicatesOpen Image, Crop Image, Show Parameters, and Segregate selectionindicators. A user can move and click a cursor on a desired selectorindicator. The Open Image indicator lists all image files 18 currentlystored in the memory 16 and enables the user to select an image forprocessing. The selected image is input 32 (see FIG. 4) to the serviceprovider 24 (via the preprocessing block 33 and the Type C tokenizationblock 35) which operates to display the selected image at the uppercenter of the monitor 20 (FIG. 5).

A material image derived by operation of the exemplary segregationsystem from the selected image is output 34 (see FIG. 4) after executionof the image segregation processing by the solver 30 and displayed atthe lower right hand of the monitor 20 (FIG. 5). The derivedillumination image is displayed at the lower right hand of the monitor20 (FIG. 5).

According to a feature of the present invention, the Crop Image selectorpermits a user to crop a selected image so as to process a portion ofthe overall image. The Show Parameter selector displays parametersrelated to the selected image file 18. Parameters for each image file 18can be stored in a parameter data file associated with a correspondingimage file 18, and include any parameters relevant to the processing ofthe image depicted in the associated image file 18, for example theglobal features extraction input 29. Parameters can include any datarelevant to image processing such as, for example, any variable forpixel analysis by the CPU 12, as for example, in the generation ofspatio-spectral operators, and domain knowledge, such as informationrelevant to the nature of the image, such as interior, exterior,buildings, lawns with green grass, trees with leaves in bloom, etc.

Below the selection indicators is a list of each of the optimizingfunctions that can be used as the solver 30, and a further list of eachof the constraint generators contained in the constraint generatorlibrary of the constraint builder 26. A user selects a desiredmathematical operation and one or more of the constraints to be imposedupon the selected image. After selection of the image to be processed,the constraints to be imposed and the mathematical operation to beexecuted, the user can click on the Segregate indicator to commenceimage segregation processing.

Upon commencement of the image segregation processing, the serviceprovider 24 retrieves the constraint generator software modules for theselected constraints to identify the spatio-spectral operators utilizedby the selected constraints. Any spatio-spectral operators not alreadystored by the service provider 24 are generated by the operators block28, for the image being segregated, and the service provider 24 cachesthe results. The cached results can be reused in any subsequentoperation for a selected image, with the same set of associatedparameters.

For example, if the selected constraint is a same illuminationconstraint, the service provider 24 identifies Type C tokens, Type Btokens and a token map for the selected image. The Type C tokens weregenerated by the Type C tokenization block 35. The service provider 24operates the operators block 28 to generate the remaining operatorsspecified by the same illumination constraint.

Referring now to FIG. 6 a, there is shown a flow chart for generatingType C token regions in the image file of FIG. 2, according to a featureof the present invention. Type C tokens can be readily identified in animage by the Type C tokenization block 35, utilizing the steps of FIG. 6a. The operators block 28 can then analyze and process the Type C tokensto construct Type B tokens when specified by a selected constraint, aswill appear.

Prior to execution of the routine of FIG. 6 a, the CPU 12 can operate tofilter the image depicted in a subject image file 18. The filters mayinclude an image texture filter, to, for example, transform patterns ofdiffering reflectance caused by a textured material into a homogeneousrepresentation that captures the spectral and spatial characteristics ofthe textured region in the image. Identification of Type B tokens can bedifficult in an image texture. A textured image contains materials with,for example, more than one reflectance function that manifests as adefining characteristic. For example, the defining characteristic can bea pattern of colors within the texture, such that the texture displays acertain distribution of colors in any patch or region selected fromanywhere within the textured region of the image.

A 1^(st) order uniform, homogeneous Type C token comprises a singlerobust color measurement among contiguous pixels of the image. At thestart of the identification routine of FIG. 6 a, the CPU 12 (executingas the Type C tokenization block 35) sets up a region map in memory. Instep 100, the CPU 12 clears the region map and assigns a region ID,which is initially set at 1. An iteration for the routine, correspondingto a pixel number, is set at i=0, and a number for an N×N pixel array,for use as a seed to determine the token, is set an initial value,N=N_(start). N_(start) can be any integer >0, for example it can be setat set at 11 or 15 pixels.

At step 102, a seed test is begun. The CPU 12 selects a first pixel,i=(1, 1) for example (see FIG. 2 a), the pixel at the upper left cornerof a first N×N sample of the image file 18. The pixel is then tested indecision block 104 to determine if the selected pixel is part of a goodseed. The test can comprise a comparison of the color value of theselected pixel to the color values of a preselected number of itsneighboring pixels as the seed, for example, the N×N array. The colorvalues comparison can be with respect to multiple color band values (RGBin our example) of the pixel or the filter output intensity histogramrepresentation of the pixel, in the event the image was filtered fortexture regions, as described above. If the comparison does not resultin approximately equal values (for example, within the noise levels ofthe recording device for RGB values) for the pixels in the seed, the CPU12 increments the value of i (step 106), for example, i=(1, 2), for anext N×N seed sample, and then tests to determine if i=i_(max) (decisionblock 108).

If the pixel value is at i_(max), a value selected as a threshold fordeciding to reduce the seed size for improved results, the seed size, N,is reduced (step 110), for example, from N=15 to N=12. In an exemplaryembodiment of the present invention, i_(max) can be set at i=(n, m). Inthis manner, the routine of FIG. 5 a parses the entire image at a firstvalue of N before repeating the routine for a reduced value of N.

After reduction of the seed size, the routine returns to step 102, andcontinues to test for token seeds. An N_(stop) value (for example, N=2)is also checked in step 110 to determine if the analysis is complete. Ifthe value of N is at N_(stop), the CPU 12 has completed a survey of theimage pixel arrays and exits the routine.

If the value of i is less than i_(max), and N is greater than N_(stop),the routine returns to step 102, and continues to test for token seeds.

When a good seed (an N×N array with approximately equal pixel values) isfound (block 104), the token is grown from the seed. In step 112, theCPU 12 pushes the pixels from the seed onto a queue. All of the pixelsin the queue are marked with the current region ID in the region map.The CPU 12 then inquires as to whether the queue is empty (decisionblock 114). If the queue is not empty, the routine proceeds to step 116.

In step 116, the CPU 12 pops the front pixel off the queue and proceedsto step 118. In step 118, the CPU 12 marks “good” neighbors around thesubject pixel, that is neighbors approximately equal in color value tothe subject pixel, with the current region ID. All of the marked goodneighbors are placed in the region map and also pushed onto the queue.The CPU 12 then returns to the decision block 114. The routine of steps114, 116, 118 is repeated until the queue is empty. At that time, all ofthe pixels forming a token in the current region will have beenidentified and marked in the region map as a Type C token. In the eventthe pixels comprise intensity histogram representations, the token canbe marked as Type C_(T).

When the queue is empty, the CPU 12 proceeds to step 120. At step 120,the CPU 12 increments the region ID for use with identification of anext token. The CPU 12 then returns to step 106 to repeat the routine inrespect of the new current token region.

Upon arrival at N=N_(stop), step 110 of the flow chart of FIG. 5 a, orcompletion of a region map that coincides with the image, the routinewill have completed the token building task. FIG. 6 b is an originalimage used as an example in the identification of tokens. The imageshows areas of the color blue and the blue in shadow, and of the colorteal and the teal in shadow. FIG. 6 c shows token regions correspondingto the region map, for example, as identified through execution of theroutine of FIG. 6 a (Type C tokens), in respect to the image of FIG. 6b. The token regions are color coded to illustrate the token makeup ofthe image of FIG. 6 b, including penumbra regions between the full colorblue and teal areas of the image and the shadow of the colored areas.

Upon completion of the routine of FIG. 6 a by the Type C tokenizationblock 35, the service provider 24 stores the Type C token regioninformation for the selected image. Prior to commencing any process togenerate Type B tokens from the identified Type C tokens, the operatorsblock 28 tests each identified Type C token to make certain that eachType C token encompasses a single material. While each Type C tokencomprises a region of the image having a single robust color measurementamong contiguous pixels of the image, the token may grow across materialboundaries.

Typically, different materials connect together in one Type C token viaa neck region often located on shadow boundaries or in areas withvarying illumination crossing different materials with similar hue butdifferent intensities. A neck pixel can be identified by examiningcharacteristics of adjacent pixels. When a pixel has two contiguouspixels on opposite sides that are not within the corresponding token,and two contiguous pixels on opposite sides that are within thecorresponding token, the pixel is defined as a neck pixel.

FIG. 7 shows a flow chart for a neck test for Type C tokens. In step122, the CPU 12 examines each pixel of an identified token to determinewhether any of the pixels under examination forms a neck. The routine ofFIG. 6 can be executed as a subroutine directly after a particular tokenis identified during execution of the routine of FIG. 6 a. All pixelsidentified as a neck are marked as “ungrowable.” In decision block 124,the CPU 12 determines if any of the pixels were marked.

If no, the CPU 12 exits the routine of FIG. 7 and returns to the routineof FIG. 6 a (step 126).

If yes, the CPU 12 proceeds to step 128 and operates to regrow the tokenfrom a seed location selected from among the unmarked pixels of thecurrent token, as per the routine of FIG. 6 a, without changing thecounts for seed size and region ID. During the regrowth process, the CPU12 does not include any pixel previously marked as ungrowable. After thetoken is regrown, the previously marked pixels are unmarked so thatother tokens may grow into them.

Subsequent to the regrowth of the token without the previously markedpixels, the CPU 12 returns to step 122 to test the newly regrown token.

Neck testing identifies Type C tokens that cross material boundaries,and regrows the identified tokens to provide single material Type Ctokens suitable for use in creating Type B tokens. FIG. 6 d shows Type Btokens generated from the Type C tokens of FIG. 6 c, according to afeature of the present invention. The present invention provides severalexemplary techniques of pixel characteristic analysis for constructingType B tokens from Type C tokens. One exemplary technique involvesarbitrary boundary removal. The arbitrary boundary removal technique canbe applied to Type C tokens whether they were generated using N colorband values (RGB in our example) of the pixel or the filter outputrepresentation of the pixel, in the event the image was filtered. Actualboundaries of any particular Type C token will be a function of the seedlocation used to generate the token, and are thus, to some extentarbitrary. There are typically many potential seed locations for eachparticular token, with each potential seed location generating a tokenwith slightly different boundaries and spatial extent because ofdifferences among the color values of the pixels of the various seeds,within the noise ranges of the recording equipment.

FIG. 8 is a flow chart for constructing Type B tokens via an arbitraryboundary removal technique, according to a feature of the presentinvention. In step 200, the CPU 12 is provided with a set (T_(c)) ofType C tokens generated with a seed size (S) via the routine of FIG. 6a, with neck removal via the routine of FIG. 7. The seed size S=S_(max),for example, S=4 pixels. In step 202, for each Type C token, t_(c) inthe set T_(c) the CPU 12 selects a number (for example 50) of potentialseeds s₁ to s_(n). In our example, each selected seed will be a 4×4pixel array from within the token region, the pixels of the array beingof approximately equal values (within the noise levels of the recordingdevice).

In step 204, the CPU 12 grows a new Type C token, utilizing the routinesof FIGS. 6 a and 7, from each seed location, s₁ to s_(n) of each tokent_(c) in the set T_(c). The newly grown tokens for each token t_(c) aredesignated as tokens r_(c1) to r_(cn). The newly grown tokens r_(c1) tor_(cn) for each token t_(c) generally overlap the original Type C tokent_(c), as well as one another.

In step 206, the CPU 12 operates to merge the newly generated tokensr_(c1) to r_(cn) of each token t_(c), respectively. The result is a newtoken R_(t) corresponding to each original token t_(c) in the set T_(c).Each new token R_(t) encompasses all of the regions of the respectiveoverlapping tokens r_(c1) to r_(cn) generated from the correspondingoriginal token t_(c). The unions of the regions comprising therespective merged new tokens R_(t) are each a more extensive token thanthe original Type C tokens of the set. The resulting merged new tokensR_(t) result in regions of the image file 18, each of a much broaderrange of variation between the pixels of the respective token R_(t) thanthe original Type C token, yet the range of variation among theconstituent pixels will still be relatively smooth. R_(t) is defined asa limited form of Type B token, Type B_(ab1), to indicate a tokengenerated by the first stage (steps 200-206) of the arbitrary boundaryremoval technique according to a feature of the present invention.

In step 208, the CPU 12 stores each of the Type B_(ab1) tokens generatedin steps 202-206 from the set of tokens T_(c), and proceeds to step 210.Type B_(ab1) tokens generated via execution of steps 202-206 may overlapsignificantly. In step 210, the CPU 12 operates to merge the R_(t)tokens stored in step 208 that overlap each other by a certainpercentage of their respective sizes. For example, a 30% overlap isgenerally sufficient to provide few, if any, false positive merges thatcombine regions containing different materials. The new set of mergedtokens still may have overlapping tokens, for example, previouslyoverlapping tokens that had a less than 30% overlap. After all mergesare complete, the CPU 12 proceeds to step 212.

In step 212, the CPU 12 identifies all pixels that are in more than onetoken (that is in an overlapping portion of two or more tokens). Eachidentified pixel is assigned to the token occupying the largest regionof the image. Thus, all overlapping tokens are modified to eliminate alloverlaps.

In step 214, the CPU 12 (as the Type C tokenization block 35 or theoperators block 28) stores the final set of merged and modified tokens,now designated as Type B_(ab2) tokens, and then exits the routine. Asnoted above, the Type B_(ab2) tokens were generated from Type C tokenswhether the Type C tokens were generated using N color band values (RGBin our example) of the pixel or the filter output representation of thepixel, in the event the image was filtered.

A second exemplary technique according to the present invention, forusing Type C tokens to create Type B tokens, is adjacent planar tokenmerging. The adjacent planar token merging can be implemented when animage depicts areas of uniform color, that is for non-textured regionsof an image. Initially, a token graph is used to identify tokens thatare near to one another. FIG. 9 shows a flow chart for creating a tokengraph, containing token map information, according to a feature of thepresent invention. Each token t_(c) in the set of Type C tokens T_(c),generated through execution of the routines of FIGS. 6 a and 7, isevaluated in terms of a maximum distance D_(max) between tokens defininga neighboring pair of tokens, t_(c), t_(n), of the set T_(c), a minimumnumber of token perimeter pixels, P_(min), in each token of theneighboring pair of tokens, and a minimum fraction of perimeter pixels,F_(min), of each token of a neighboring pair of tokens, required to bewithin D_(max).

In step 300, the CPU 12 selects a Type C token t_(c) in the set of TypeC tokens T_(c), and identifies the pixels of the selected token t_(c)forming the perimeter of the token. In a decision block 302, the CPU 12determines whether the number of perimeter pixels is less than P_(min),for example 10 pixels.

If yes, the CPU 12 proceeds to decision block 304 to determine whetherthere are any remaining tokens t_(c) in the set of Type C tokens T_(c).If yes, the CPU 12 returns to step 300, if no, the CPU 12 exits theroutine 306.

If no, the CPU 12 proceeds to step 308. In step 308, the CPU 12generates a bounding box used as a mask to surround the selected tokent_(c). The bounding box is dimensioned to be at least D_(max) largerthan the selected token t_(c) in all directions. A known distancetransform (for example, as described in P. Felzenszwalb and D.Huttenlocher, Distance Transforms of Sampled Functions, CornellComputing and Information Science Technical Report TR2004-1963,September 2004), is executed to find the distance from each perimeterpixel of the selected token t_(c) to all the pixels in the surroundingbounding box. The output of the distance transform comprises two maps,each of the same size as the bounding box, a distance map and a closestpixel map. The distance map includes the Euclidean distance from eachpixel of the bounding box to the nearest perimeter pixel of the selectedtoken t_(c). The closest pixel map identifies, for each pixel in thedistance map, which perimeter pixel is the closest to it.

In step 310, the CPU 12 scans the distance map generated in step 308 toidentify tokens corresponding to pixels of the bounding box (from theregion map generated via the routine of FIG. 6 a), to identify a tokenfrom among all tokens represented by pixels in the bounding box, thathas a number N_(cn) of pixels within the distance D_(max), whereinN_(cn) is greater than P_(min), and greater than F_(min) * perimeterpixels of the respective token and the average distance between therespective token and t_(c) is the lowest of the tokens corresponding tothe pixels in the bounding box. If these conditions are satisfied, therespective token is designated t_(n) of a possible token pair t_(c),t_(n), and a link L_(cn) is marked active.

In step 312, the CPU 12 checks to determine whether a reciprocal linkL_(cn) is also marked active, and when it is marked active, the CPU 12marks and stores in the token graph, an indication that the token pairt_(c), t_(n) is a neighboring token pair. The reciprocal link refers tothe link status in the evaluation of the token designated as t_(n) inthe current evaluation. If that token has yet to be evaluated, the pairis not designated as a neighboring token pair until the link L_(cn) isverified as active in the subsequent evaluation of the token t_(n). TheCPU 12 then returns to decision block 304 to determine whether there areany further tokens in the set T_(c).

Upon completion of the token graph, the CPU 12 utilizes token pairinformation stored in the graph in the execution of the routine of FIG.10. FIG. 10 shows a flow chart for constructing Type B tokens via theadjacent planar token merging technique, according to a feature of thepresent invention. In the adjacent planer merging technique, pairs oftokens are examined to determine whether there is a smooth and coherentchange in color values, in a two dimensional measure, between the tokensof the pair. The color change is examined in terms of a planarrepresentation of each channel of the color, for example the RGBcomponents of the pixels according to the exemplary embodiments of thepresent invention. A smooth change is defined as the condition when aset of planes (one plane per color component) is a good fit for thepixel values of two neighboring tokens. In summary, neighboring tokensare considered the same material and a Type B token when the colorchange in a two-dimensional sense is approximately planar.

In step 320, the CPU 12 selects a token pair t_(c), t_(n) from the tokengraph. In decision block 322, the CPU 12 determines whether the meancolor in token t_(c) is significantly different from the mean color inthe token t_(c). The difference can be a function of a z-score, a knownstatistical measurement (see, for example, Abdi, H. (2007), Z-scores, inN. J. Salkind (Ed.), Encyclopedia of Measurement and Statistics,Thousand Oaks, Calif.: Sage), for example, a z-score greater than 3.0.

If the mean colors of the token pair are different, the CPU 12 proceedsto decision block 324 to determine whether there are any additionaltoken pairs in the token graph. If yes, the CPU 12 returns to step 320.If no, the CPU 12 exits the routine (step 326).

If the mean colors are within the z-score parameter, the CPU 12 proceedsto step 328. In step 328, the CPU 12 performs a mathematical operationsuch as, for example, a least median of squares regression (see, forexample, Peter J. Rousseeuw, Least Median of Squares Regression, Journalof the American Statistical Association, Vol. 79, No. 388 (December,1984), pp. 871-880) to fit a plane to each color channel of the pixels(in our example RGB) of the token pair t_(c), t_(n), as a function ofrow n and column m (see FIG. 2), the planes being defined by theequations:R=X _(Rn) +Y _(Rm) +Z _(R) G=X _(Gn) +Z _(G) B=X _(Bn) +Y _(Bm) +Z _(B)wherein parameter values X, Y and C are determined by the least medianof squares regression operation of the CPU 12.

Upon completion of the plane fitting operation, the CPU 12 proceeds tostep 330. In step 330, the CPU 12 examines each pixel of each of thetokens of the token pair t_(c), t_(n) to calculate the z-score betweeneach pixel of the tokens and the planar fit expressed by the equation ofthe least median of squares regression operation. When at least athreshold percentage of the pixels of each token of the pair (forexample, 80%), are within a maximum z-score (for example, 0.75), thenthe neighboring token pair is marked in the token graph as indicatingthe same material in the image. After completion of step 330, the CPU 12returns to decision block 324.

Upon exiting the routine of FIG. 10, the CPU 12 examines the token graphfor all token pairs indicating the same material. The CPU 12 can achievethe examination through performance of a known technique such as, forexample, a union find algorithm. (See, for example, Zvi Galil andGiuseppe F. Italiano. Data structures and algorithms for disjoint setunion problems, ACM Computing Surveys, Volume 23, Issue 3 (September1991), pages 319-344). As a simple example, assume a set of seven Type Ctokens T₁, T₂, T₃, T₄, T₅, T₆, T₇. Assume that the result of theexecution of FIG. 9, (performance of the adjacent planar analysis),indicates that tokens T₁ and T₂ are marked as the same material, andtokens T₁ and T₃ are also marked as the same material. Moreover, theresults further indicate that tokens T₄ and T₅ are marked as the samematerial, and tokens T₅ and T₆ are also marked as the same material. Theresult of execution of the union find algorithm would therefore indicatethat tokens {T₁, T₂, T₃} form a first group within the image consistingof a single material, tokens {T₄, T₅, T₆} form a second group within theimage consisting of a single material, and token {T₇}forms a third groupwithin the image consisting of a single material. The groups {T₁, T₂,T₃}, {T₄, T₅, T₆} and {T₇} form three Type B tokens.

A third exemplary technique according to the present invention, forusing Type C tokens to create Type B tokens, is a local token analysis.A local token approach generates Type C tokens using a window analysisof a scene depicted in an image file 18. Such tokens are designated asType C_(w) tokens. FIG. 11 is a flow chart for generating Type C_(w)tokens via the local token analysis technique, according to a feature ofthe present invention.

In step 400, the CPU 12 places a window of fixed size, for example, a33×33 pixel array mask, over a preselected series of scan positions overthe image. The window can be a shape other than a square. The scanpositions are offset from one another by a fixed amount, for example ½window size, and are arranged, in total, to fully cover the image. Thewindow area of pixels at each scan position generates a Type C_(w)token, though not every pixel within the window at the respective scanposition is in the Type C_(w) token generated at the respective scanposition.

At each scan position (step 402), the CPU 12 operates, as a function ofthe pixels within the window, to fit each of a set of planes, onecorresponding to the intensity of each color channel (for example, RGB),and an RGB line in RGB space, characterized by a start point I₀ and anend point I₁ of the colors within the window. The planar fit provides aspatial representation of the pixel intensity within the window, and theline fit provides a spectral representation of the pixels within thewindow.

For the planar fit, the planes are defined by the equations:R=X _(Rn) +Y _(Rm) +Z _(R) G=X _(Gn) +Y _(Gm) +Z _(G) B=X _(Bn) +Y _(Bm)+Z _(B)wherein parameter values X, Y and C are determined by CPU 12 byexecuting a mathematical operation such as the least median of squaresregression discussed above, a least-squares estimator, such as singularvalue decomposition, or a robust estimator such as RANSAC (see, forexample, M. A. Fischler, R. C. Bolles. Random Sample Consensus: AParadigm for Model Fitting with Applications to Image Analysis andAutomated Cartography. Comm. of the ACM, Vol 24, pp 381-395, 1981).

For the RGB line fit, the line is defined by:

I(r,g,b)=I₀(r,g,b)+t(I₁(r,g,b)−I₀(r,g,b)) wherein the parameter t has avalue between 0 and 1, and can be determined by the CPU 12 utilizing anyof the mathematical techniques used to find the planar fit.

At each scan position, after completion of step 402, the CPU 12 operatesin step 404 to examine each pixel in the window in respect of each ofthe planar fit representation and RGB line representation correspondingto the respective window scan position. For each pixel, the CPU 12determines an error factor for the pixel relative to each of theestablished planes and RGB line. The error factor is related to theabsolute distance of the pixel to its projection on either from eitherthe planar fit or the RGB line fit. The error factor can be a functionof the noise present in the recording equipment or be a percentage ofthe maximum RGB value within the window, for example 1%. Any pixeldistance within the error factor relative to either the spatial planarfit or the spectral line fit is labeled an inlier for the Type C_(w)token being generated at the respective scan position. The CPU 12 alsorecords for the Type C_(w) token being generated at the respective scanposition, a list of all inlier pixels.

At each scan position, after completion of step 404, the CPU 12 operatesin step 406 to assign a membership value to each inlier pixel in thewindow. The membership value can be based upon the distance of theinlier pixel from either the planar fit or the RGB line fit. In oneexemplary embodiment of the present invention, the membership value isthe inverse of the distance used to determine inlier status for thepixel. In a second exemplary embodiment, a zero-centered Gaussiandistribution with a standard deviation is executed to calculatemembership values for the inlier pixels.

After all of the scan positions are processed to generate the Type C_(w)tokens, one per scan position, the CPU 12 operates to compile and storea token data list (step 408). The token data list contains two lists. Afirst list lists all of the pixels in the image file 18, and for eachpixel, an indication of each Type C_(w) token to which it labeled as aninlier pixel, and the corresponding membership value. A second listlists all of the generated Type C_(w) tokens, and for each token anindication of the inlier pixels of the respective token, and thecorresponding membership value. After compiling and storing the tokendata list, the CPU 12 exits the routine (step 410).

FIG. 12 is a flow chart for constructing Type B tokens from the TypeC_(w) tokens generated via the local token analysis technique, accordingto a feature of the present invention. In step 420, the CPU 12calculates a similarity of parameters of the spatial planer dimensionsand spectral RGB lines of adjacent or overlapping Type C_(w) tokensgenerated through execution of the routine of FIG. 108. Overlapping andadjacent Type C_(w) tokens can be defined as tokens corresponding toscan positions that overlap or are contiguous. A similarity thresholdcan be set as a percentage of difference between each of the spatialplaner dimensions and spectral RGB lines of two overlapping or adjacentType C_(w) tokens being compared. The percentage can be a function ofthe noise of, for example, the camera 14 used to record the scene of theimage file 18. All overlapping or adjacent Type C_(w) token pairs havinga calculated similarity within the similarity threshold are placed on alist.

In step 422, the CPU 12 sorts the list of overlapping or adjacent TypeC_(w) token pairs having a calculated similarity within the similaritythreshold, in the order of most similar to least similar pairs. In step424, the CPU 12 merges similar token pairs, in the order of the sort,and labeling pairs as per degree of similarity. Each merged token pairwill be considered a Type B token. In step 426, the CPU 12 stores thelist of Type B tokens, and exits the routine.

Each of the arbitrary boundary removal, adjacent planar token merging,and local token analysis techniques involves an analysis of an inputimage file 18 (steps 1000 and 1002 of FIG. 3) by the CPU 12 to identifyType B tokens. Pursuant to a further feature of the present invention, acomputer learning technique can also be implemented for execution by theCPU 12, during the performance of step 1002, for identification ofillumination and material aspects of an image, as for example,same-material regions of an image depicted in an input image file 18. Inthis manner, advantage can be taken of advanced machine learningtechniques, including the training of a computer to recognize featuresindicative of material and illumination characteristics in an image, tothereby enhance the accuracy of an image segregation operation.

An example of a computer learning technique includes the step ofproviding a training set, with the training set comprising a series ofimages selected from an image database. For each of the images of thetraining set, local image features are identified to providerepresentations of, for example, features of interest and relevant tomaterial and illumination aspects of an image. The local image featurescan include spatio-spectral features that provide information that canbe used to indicate a likelihood that regions of an image have a sameintrinsic characteristic, such as, for example, that a pair of Type Ctokens of an image file, are of the same material. Parameters based uponthe features are arranged in feature vectors for analysis. In a learningtechnique, a classifier is built from the training set, for use in thelearning technique, to classify regions of the image as having a sameintrinsic characteristic. The parameters of the classifier are trainedon feature vectors extracted from the training set. A classifier is aprocess or algorithm that takes feature vectors extracted from an inputimage file and outputs predictions or classifications regardingillumination and material aspects of an image depicted in the inputimage file. The classifier is applied in an analysis of feature vectorsof a selected input image file 18, to, for example, identify regions,for example, selected pairs of Type C tokens of the subject image file18, of a same material.

FIG. 13 is a flow chart for training a classifier, in connection with acomputer learning technique for use in the identification of a sameintrinsic characteristic such as, for example, same-material regions ofan image, according to a feature of the present invention. In step 1010,an image database is provided. The image database contains, for example,one thousand image files depicting images of various items. The itemsdepicted in the image files of the database can be everyday items suchas supermarket products recorded against a flat plain grey background.The illumination conditions at the time of recording of the images arevaried from uniform illumination, to illumination conditions that causedeep shadows with broad penumbrae, to shadows with sharp penumbrae.

Pursuant to an exemplary embodiment of the present invention, the imagefiles of the image database are each input, in a sequence, one after theother, to the CPU 12, for processing (steps 1012-1016 and 1018-1020, to1022, as will be described), to assemble a set of feature vectors foreach image file of the database. The set of feature vectors comprisesone feature vector per group under examination. In our example, eachgroup includes a pair of Type C tokens to be examined to determinewhether they are of the same material. A set of candidate token pairscan be selected for each image file, based upon, for example, proximity,brightness or any other image characteristics suitable to provide asrepresentation of the image.

As noted above, each feature vector comprises image parameters, such asspatio-spectral features of each input image file from the database,that provide information that can be used to indicate features ofinterest, such as, for example, features relevant to a segregation of animage into material and illumination components of the image. In anexemplary embodiment of the present invention, the features of interestindicate a likelihood that regions of an image, such as, for example, apair of Type C tokens of the image file being examined, are of the samematerial. The present invention contemplates features relevant to anyintrinsic characteristic between regions of an image, such as a samematerial, or a same illumination, or any of the spatio-spectraloperator/constraint arrangements described above. The parameters, in theexemplary embodiment of a same material between regions, include, forexample:

-   -   absolute difference of average “color” within the regions being        examined, per band;    -   local derivatives computed across a boundary shared by, for        example, adjacent tokens being examined;    -   comparison of color line fits;    -   comparison of color plane fits;    -   Type B token features;    -   geometry features.

Pursuant to a feature of the present invention, in determining theabsolute difference of average “color” within, for example, a pair ofType C tokens, multiple color spaces are used. For example, for an imagefile being examined, color differences are computed in terms of, forexample, each of the color bands of linear RGB values of the pixels ofthe image, as stored in the image file 18, corresponding log RGB valuesfor the pixels of the image file, HSV values (according to a known hue,saturation, brightness color space), log chromaticity values, asdescribed above, values according to a known L, a, b color space andvalues computed via execution of a known tone mapping process.

Local derivatives taken along a boundary between adjacent tokens can beof any order, that is, first second and/or third order derivatives forpixels adjacent, and across from each other, along the boundary. Thederivatives can also be taken relative to different color spaces, asdescribed above in respect of the determination of absolute differenceof color. Derivatives can also be taken across a virtual boundary. Avirtual boundary is relative to two tokens that are not strictlyadjacent, but have only blend tokens between them.

A blend token consists of pixels that straddle a distinct color changeboundary. Each pixel of an image file is an average of the colors of agrid overlying a region of an image. When the grid overlies a boundarybetween two distinct colors, cells of the grid can include one or theother, or both colors of the boundary. Thus, the corresponding pixelwill be an average of the two colors, appearing as a color that was notpresent in the original scene being recorded. Additionally,imperfections in lens design and manufacture, in image focus, and in thelimits of resolution, can soften edges, resulting in the creation ofblend pixels, such that a sharp boundary is recorded in the image as 2-3blend pixels, for the change in the image to fully occur. The pixels ofthe blend tokens are each assigned to the token with the most similarcolor. The derivatives are then computed along the resulting newboundary.

A comparison of color line fits is implemented with the use of a robustfitting approach, such as, for example, RANSAC. An evaluation is made asto how close the RGB values of one token of the pair are to the line fitto the pixels of the other token. Thus, a line fit is computed for theRGB values of the first token of a pair of tokens, and then acalculation is made of the RMS distance between the pixels of the secondtoken of the pair to the line of the first token, and vice versa.

A comparison of color plane fits is analogous to the color line fitfeature. In this analysis, pixels of a second token are examined as tohow well they fit color planes in each color band, of a first token, andvice versa. The planes are fit to (x, y, color band) in a token, withone plane per color band.

An analysis of Type B features involves the examination of same-materialcharacteristics relevant to a Type B tokenization of an image, asdescribed above in respect to the arbitrary boundary removal, adjacentplanar token merging, and local token analysis techniques. A first TypeB token feature is a simple binary indicator as to whether two tokens ofa given pair are in a same Type B token, as identified via execution ofa Type B tokenization of the image. A second feature relates to an RMSdifference between the brightest (or darkest) part of the relevant Btokens to which each of the tokens of a token pair belong. Thisexamination is useful to identify non-contiguous regions of an imagethat are of the same material.

For example, there is the case of one Type C token of a pair in Type Btoken 1, and the other Type C token of the pair in Type B token 2, withTokens 1 and 2 being non-contiguous regions of the same material. IfToken 1 spans a lit part and a shadowed part of the material and the oneType C token is within the shadowed part of Token 1 and Token 2 is fullylit, then the two Type C tokens of the pair will appear to be differentsince the two tokens are lit and shadowed versions of the same material.However, a comparison of the lit part of Token 1 with Token 2 provides amore meaningful comparison between the token pair, and thus serves as adistinguishing feature for the subsequent classifier.

An analysis of geometry features involve the computation of variouscomparisons of the geometric properties of a token pair. The comparisonsinclude, for example, a comparison of the individual areas of the tokensof the pair, the ratio of the areas, a comparison of the individualperimeter lengths, the ratio of the perimeter lengths, and the fractionof the perimeter lengths that are shared between the pair of tokens.Binary features include an indication as to whether the two tokens ofthe pair are strictly adjacent, and an indication as to whether the twotokens are adjacent after a blend token removal, as described above.

Additional geometric features include features that capture spatialproximity. For example, a measure is made of the simple distance betweenthe centroids of the tokens of the pair, and, further, a measure is madeof the average distance between the boundary pixels of the token pair.

Referring once again to FIG. 13, in step 1012 the CPU 12 operates toidentify Type C tokens in an image file from the image database, as forexample, through execution of the routine of FIG. 6 a, as describedabove. In step 1014, the CPU 12 performs an analysis of perimeter pixelsof each Type C token identified in step 1012, to define the boundariesof each of the identified tokens. In step 1016, the information fromsteps 1012 and 1014 are input to a compute boundary features module,executed by the CPU 12 (step 1016), as will appear.

In a separate operation, executed in either parallel or serialoperations, the CPU 12 (step 1018) computes image bands, that is, forthe pixels of the image file, the CPU 12 computes log RGB values, HSVvalues, log chromaticity values, L, a, b color space values and valuescomputed via execution of a known tone mapping process. The informationfrom step 1018 is input to the compute boundary features module (forstep 1016), and to step 1020.

In step 1020, the CPU 12 receives the color band information of step1018, and the Type C token information from step 1012, and computes TypeC token statistics, for use in the generation of the feature vector. Thestatistics can include, for example, line and plane fit parameters,token areas, mean and average colors, in each band, for each colorspace, for each token, color histograms, and all of the variousparameters as are required to compute the features described above, as afunction of the identified tokens and the color values for the pixels ofthe tokens. The CPU 12 can also identify Type B tokens via the arbitraryboundary removal, adjacent planar token merging, and local tokenanalysis techniques. Information from step 1020 is input to the computeboundary features module of step 1016.

In step 1016, the CPU 12 executes the compute boundary features moduleto calculate the features described above, for each candidate pair oftokens, as a function of the information generated in steps 1012, 1014,1018 and 1020. In addition, in step 1016, the CPU 12 optionally computescharacteristics of pixels in the neighborhood of a boundary betweenadjacent tokens, for determination of similarity to a kernel from apredefined set of kernels. To that end, a kernel bank 1024 includes aset of samples, for example, 64 samples of illumination boundariesacross a same material. The CPU 12 compares the sampled pixels from atokens pair from an image file of the image database, under examination,to each of the samples of the kernel bank, and calculates a score as afunction of any found similarities.

In step 1022, the CPU 12 concatenates the feature values from theexecution of the compute boundary features module to define a featurevector for each selected token pair of an image file (step 1016). TheCPU 12 performs the steps 1012-1022 for each image file in the imagedatabase, to generate a series of sets of feature vectors, one featurevector in the set for each of the selected token pairs of an image file,one set per image file, from the image files of the image database.

Pursuant to a further feature of the present invention, in step 1026,each image file of the image database is manually coded to indicateregions of a same intrinsic characteristic in the image depicted in therespective image file. In an exemplary embodiment of the presentinvention, the coding is a color coding wherein a human operator viewseach image file and in a touch screen, brush stroke operation, marks offeach region of a same material, and controls the computer to assign adifferent arbitrary color to each marked region. Blend tokens can remainunclassified in the color coding. This provides a ground truth versionof each image file used to train the classifier.

In step 1028, an operation is performed to use the feature vectorsgenerated by the CPU 12, and the manually color coded ground truth imagefiles, to train the computer and generate a classifier (output 1030).Any number of well known trainable classifiers provided by machinelearning technology can be used to implement step 1028, such as, forexample, decision trees, neural networks, support vector machines,boosted decision trees (such as Logitboost), and so on.

FIG. 14 is a flow chart for the use of the classifier trained accordingto the flow chart of FIG. 13 (classifier parameters 1030), to identifyadjacent Type C tokens that are likely to be of the same material, as abasis for identifying same-material constraints. In step 2000 an imagefile 18 is input to the CPU 12 for identification of token pairssuitable for a same-material constraint. The CPU 12 executes steps2012-2020 to perform the exact same functions in respect to the inputimage file 18 as were performed in steps 1012-1020 for the image filesof the image database, to generate a feature vector for each one of theselected token pairs of the input image file 18.

In step 2022, the feature vectors for the input image file 18 are eachprocessed by the trained classifier, for example, a Logitboostclassifier or a Support Vector Machine (SVM) classifier, to compute aprediction or classification score for each token pair, as a function ofthe classifier parameters 1030, (as determined via execution by the CPU12 of the routine of FIG. 13). The score can be in the range of from 0to 1, a 0 corresponding to no likelihood that tokens of a selected tokenpair are of the same material and 1 corresponding to a high likelihoodthat the same-material characteristic is true. In step 2024, a thresholdvalue can be selected, for example, 0.8, such that a score for a tokenpair classified by the trained classifier that is above the threshold(YES) results in the pair being treated as being of the same material(and thus candidates for a same-material constriant) (step 2026), and ascore below the threshold (NO) results in the pair being treated as notbeing of the same material (step 2028).

According to a further feature of the present invention, a classifiercan be trained to process token pairs that are not adjacent. Such aclassifier will be based upon the features described above which do notrelate to adjacency aspects of the token pair. This expands thecapability of the computer learning technique to consider token groupsthat include both adjacent and non-adjacent token pairs, with each typeof group being analyzed with an appropriate classifier.

In a further exemplary embodiment of the present invention, the CPU 12(executing as the operators block 28) compiles lists of Type B tokensseparately generated through each of and/or a combination of one or moreof the arbitrary boundary removal, adjacent planar token merging, andlocal token analysis techniques. The determination of the combination oftechniques used depends in part on whether a particular region of theimage was filtered because of texturing of the image. Since each Type Btoken generated through the described techniques likely represents asingle material under varying illumination conditions, mergingsufficiently overlapping Type B tokens generated through the use ofvarying and different techniques, provides a resulting, merged Type Btoken that represents a more extensive area of the image comprising asingle material, and approaches the extent of a Type A token.

Sufficiently overlapping can be defined by satisfaction of certain pixelcharacteristic criteria, such as, for example:

-   A) The two Type B tokens have at least n of the original Type C    tokens in common, for example, n=1-   B) The two Type B tokens have at least n pixels in common, for    example, n=20-   C) The two Type B tokens have at least n % overlap, that is at least    n % of the pixels in a first one of the two Type B tokens are also    found in the second one of the two Type B tokens or vice versa,    wherein, for example n %=10%.-   D) The percentage of pixels in a smaller one of the two Type B    tokens, also found in the larger one of the two Type B tokens is    above a preselected threshold, for example 15%.-   E) A preselected combination of criteria A-D.

Merging of two sufficiently overlapping Type B tokens can beaccomplished via a mathematical operation such as execution of the unionfind algorithm discussed above. In the case of two overlapping Type Btokens that do not satisfy the above discussed criteria, the overlappingpixels of the two tokens can be assigned to the larger one of the twoType B tokens.

As a result of execution by the Type C tokenization block 35 and/or theoperators block 28 (via the CPU 12) of the token generation and mergingtechniques according to features of the present invention, an image canbe accurately segmented into tokens representing discrete materialsdepicted in the scene (Type B tokens) and tokens representing regions ofrobust similar color (Type C tokens), thus providing a basis forcomputational efficiencies, as the token representations capturespatio-spectral information of a significant number of constituentpixels. The service provider 24 stores all of the Type C and Type Btokens generated through execution of the above described tokengeneration techniques, along with the relevant token map information,for example, as determined during execution of the adjacent planar tokenmerging technique, and cross-references the stored operator results tothe associated selected image file 18, for use in any segregationprocessing of the selected image.

According to a feature of the present invention, the service provider 24also stores all Type C token pairs that have been classified as a samematerial as a result of the execution of a computer learning technique,for example the Logitboost classification.

In a further exemplary embodiment of the present invention, Type Ctokens for an input image file 18 are clustered relative to pairwisedistances, and the Type C tokens of the resulting clusters are used asthe basis for constraints such as, for example, same-materialconstraints. This is done to avoid chaining. Chaining occurs due to thefact that a classifier is sometimes incorrect in its operation.

For example, a classifier assigns a high score to a token pair,indicating a high probability that the tokens of the pair are of thesame material, when in fact, the pair is not of the same material. Forinstance, when a classifier assigns a high score to each of token pairs(A, B), (B, C) and (C, D), and (B, C) is wrong, then A and C are not ofthe same material. If constraints are defined by all threerelationships, an error will occur in the material image. Clustering canbe used to help avoid this situation. For example, if the token pair Band D had a low score, then A and B would be in one cluster, and C and Din another, and, thus, a constraint based upon B and C would not be usedsince they would be in different clusters.

Referring now to FIG. 15, there is shown a flow chart for constructing asame-material constraint based upon sparse clustering, according to afeature of the present invention.

In step 1050, the CPU 12 is given a set of indicia measuring one ofsimilarities and dissimilarities between selected pairs of Type Ctokens. The token pairs selected for a sparse clustering includeadjacent token pairs and some non-adjacent token pairs. A sparsesampling of token pairs can be selected according to several differentselection criteria:

-   -   1. The CPU 12 is operated to select all adjacent pairs of tokens        and a random subset of non-adjacent token pairs.    -   2. The CPU 12 is operated to select token pairs as a function of        token seed size (the number of pixels used to build the token,        see the description above). For example, each seed size is        assigned a probability, such as a seed size of 5 or 6 pixels has        a probability of 1.0, such that all pairs of tokens, each token        of the pair built with a seed size of 5 or 6 pixels, are        selected. For each token pair, the probability for selection for        the sparse sampling equals the probability of the first token of        the prospective token pair, times the probability of the second        token of the prospective pair. A seed size of 4 is assigned a        probability of 0.8, a seed size of 3, 0.6, a seed size of 2,        0.3, and a seed size of 1, a probability of 0.1.    -   3. The CPU 12 is operated to select all pairs of adjacent        tokens, as a function of a probability based upon distance        between tokens of a prospective pair of tokens. The probability        can decrease according to a Gaussian profile.    -   4. The CPU 12 is operated to integrate the pair sampling into        the clustering operation of step 1054 (as described below). For        example, after clustering tokens of all adjacent pairs, then for        all cluster pairs, some additional number of edges between        tokens in one cluster and tokens in the other cluster of a pair        are sampled. For example, log (n) edges are sampled where n is        the number of tokens in the smaller cluster in a cluster pair.

For the indicia of one of similarities and dissimilarities, theclassifier provides, for example, a score of between 0 and 1 for eachpair of Type C tokens, via execution of the routine of FIG. 14. As notedabove, in our example, a score of 0 corresponds to no likelihood thattokens of a selected token pair are of the same material and a score of1 corresponds to a high likelihood that the same-material characteristicis true (an indicia of similarity). Token pairs with both high scoresand low scores are processed in the clustering operation.

In step 1052, the CPU 12 transforms probabilities represented by thescore information into pairwise distance measurements between tokenpairs. For example, distance=(1−probability) ordistance=−log(probability). The pairwise connections expressed by thedistance measurements are considered in terms of a graph structure. Inthe graph, each token is a node, and there is an edge between eachselected pair of nodes. Each edge is assigned the distance for the tokenpair. A subset of all possible pairs is selected, as described above, tominimize the computational overhead for the computer system.

In step 1054, the CPU performs a clustering operation. The clusteringinvolves the execution of an algorithm to find groups of nodes withshort edges (i.e. a high probability or likelihood of being the samematerial). Each group should not contain any node pairs with a long edge(low probability). According to a feature of the present invention, asparse clustering algorithm is used to cluster the nodes because of thesampling of a subset of all nodes, that is the selected pairs of tokens,as selected according to the criteria described above. In an exemplaryembodiment of the present invention, the hierarchical clusteringalgorithm described in G. N. Lance and W. T. Williams “A General Theoryof Classificatory Sorting Strategies: I. Hierarchical Systems” TheComputer Journal 1967 9 (4) pages 373-380, is adapted to work on sparsepairwise distances, and implemented.

Any clustering algorithm that can be adapted to work on sparse pairwisedistances can be implemented to cluster nodes with short edges intogroups. For example, a simple modification to an hierarchicalagglomerative clustering technique, such as the technique cited above,is an assumption that the properties of edges that are known arerepresentative of the properties of all edges between two groups. Thus,a sparse hierarchical agglomerative clustering algorithm is stated asfollows:

-   -   1. Sample and record distances between a subset of data point        pairs.    -   2. Initialize each cluster to a single point.    -   3. Scan through the edge list, finding the smallest edge        (shortest distance between two clusters).    -   4. Merge the two points connected by that edge and update the        edge list. If clusters i and j are merged into cluster k, then        update the distance to cluster h, as follows:    -   (a) If there is no edge between j and h, then d_(kh)=d_(th).    -   (b) If there is no edge between i and h, then d_(kh)=d_(jh).    -   (c) If there is an edge between i and h and between j and h,        then set d_(kh) as per:        d _(hk)=α_(i) d _(hi)+α_(j) d _(hj) +βd _(ij) +γ|d _(hi) −d        _(hj)|    -   where the parameters α_(i), α_(j), β and γ can be set according        to several criteria, such as:    -   i. Minimum Link: where α_(i)=α_(j)=0.5, β=0 and γ=−0.5    -   In this mode, the distance between the third cluster h and the        newly formed cluster k is at the minimum of the distance between        h and the original clusters i and j:        d_(hk)=0.5*d_(hi)+0.5*d_(hj)−0.5*|d_(hi)−d_(hj)|    -   ii. Average Link: where α_(i)=n_(i)/(n_(i)+n_(j)),        α_(j)=n_(j)/(n_(i)+n_(j)), β=0 and γ=0    -   In this mode, the distance between h and k is the weighted        average of the distance between h and i and between h and j. The        weighting is provided by the number of points in each cluster,        n, and indicates that the distance between h and k is the        average between points in h and points in k.    -   iii. Centroid: where α_(i)=n_(i)/(n_(i)+n_(j)),        α_(j)=n_(j)/(n_(i)+n_(j)), β=−α_(i)*α_(j) and γ=0    -   In this mode, when merging clusters i and j to form cluster k, k        is considered to be at the centroid of i and j, provided the        pairwise distance matrix represents squared Euclidean distance.        That is, if x_(i) is the centroid of cluster i, then:        x_(k)=(n_(i)x_(i)+n_(j)x_(j))/(n_(i)+n_(j)). The squared        distance between cluster h and cluster k is:        d_(hk)=[x_(h)−(n_(i)x_(i)+n_(j)x_(j))/(n_(i)+n_(j))]²    -   by multiplying terms, and rearranging:

$\begin{matrix}{d_{hk} = {{{n_{i}/n_{k}}*\left( {x_{h} - x_{i}} \right)^{2}} + {{n_{i}/n_{k}}*\left( {x_{h} - x_{j}} \right)^{2}} - {n_{i}{n_{i}/n_{k}}n_{k}*\left( {x_{h} - x_{j}} \right)^{2}}}} \\{= {{{n_{i}/n_{k}}*d_{hi}} + {{n_{i}/n_{k}}*d_{hj}} - {n_{i}{n_{i}/n_{k}}n_{k}*d_{ij}}}}\end{matrix}$

-   -   which illustrates where the specified values of α_(i), α_(j), β        and γ come from.    -   iv. Median: where α_(i)=α_(j)=0.5, β=−0.25 and γ=0    -   This is similar to the centroid mode, but pretends the two        cluster sizes are equal.    -   In the case of squared Euclidean distance, this indicates that        the apparent location of new cluster k is halfway between i and        j, even if one cluster has many more points than the other. The        term “median” is used because the new distance vector form h to        k lies along the median of the triangle from h to j to i    -   v. Flexible: where α_(i)=α_(j)=0.625, β=−0.25 and γ=0    -   In this mode every distance update rule affects the space of the        data points. Some modes, like min link, contract the space: when        creating a new group that new group now appears closer to some        or all of the other existing groups. This is what can lead to        undesirable chaining. During clustering, space contracting        methods tend to have single points joining existing large        clusters rather than starting new clusters with other        singletons. Other modes dilate the space: on formation a cluster        is now further from other groups than it used to be. In this        mode, single points tend to join other single points to create        new clusters, and will often cluster together outliers even        though the points may be very different. Both centroid and        average link are space conversing, neither dilating nor        contracting space. The flexible mode is defined by setting        α_(i)+α_(j)+β=1, α_(i)=α_(j) β<1 and γ=0. As β is varied from        near 1 to −1, the clustering behavior moves from strongly        space-contracting to strongly space-dilating. At β=−0.25, the        clustering is slightly dilating.    -   5. Repeat step 3 until there are no more remaining merges below        a threshold level or until a preselected number of clusters is        reached.

In step 1056, the CPU 12 defines same-material constraints based uponthe identified clusters of tokens. For example, each pair of tokens in acluster can be the basis of a same material constraint, or a star nodeor Steiner node constraint can be defined relative to each token of acluster, and a new variable.

FIG. 16 is a flow chart for a further method for clusteringsame-material regions of an image, according to a feature of the presentinvention. In step 2050, the CPU 12 is given a set of indicia of one ofsimilarities and dissimilarities between selected pairs of Type Ctokens, as in the routine of FIG. 15. Steps 2052 and 2054 are performedseparately, either in parallel or serially. In step 2052, the CPU 12stores all of the indicia, in terms of scores determined by theclassifier through execution of the routine of FIG. 14, for all thetoken pair edges, in a priority queue. In step 2054, the CPU 12initializes an illumination field, for an illumination image, to white,and a material field, for a material image to the image depicted in aninput image file 18.

In step 2056, the CPU 12 removes the edge from the priority queue, forthe token pair with the highest score. In step 2058, the CPU 12 decidesif the edge with the highest score is sufficiently high to continue theexecution of the routine of FIG. 16. For example, a threshold may be setfor a score. If the edge with the highest score is not higher than thethreshold value, the routine is stopped (step 2060). If the scoreexceeds the threshold, the CPU 12 continues on to step 2062. Thethreshold can be set at, for example, 0.8.

In step 2062, the token pair with the highest score is merged as asingle material, and the CPU 12 operates to generate illumination andmaterial images, as described in the present detailed description.

In step 2064, the CPU 12 assembles statistics regarding the quality ofthe material and illumination images generated with the assumption thata current token pair with a highest score is a same material match. Thestatistics can include, for example, information on features such as thevariety of colors in the illumination map, the prevalence and strengthof concurrent boundaries in the material and illumination images, thesmoothness of the illumination image and the piecewise uniformity in thematerial image.

A secondary classifier is trained, for example, as in the routine ofFIG. 13, with the features set forth above, such as absolute differenceof average “color” within the regions being examined, per band; localderivatives computed across a boundary shared by, for example, adjacenttokens being examined; comparison of color line fits; comparison ofcolor plane fits; Type B token features; and geometry features, as wellas the features of the variety of colors in the illumination map, theprevalence and strength of concurrent boundaries in the material andillumination images, the smoothness of the illumination image and thepiecewise uniformity in the material image.

In step 2066, the secondary classifier is applied to classify thecurrent token pair with the highest score, for example, as per theroutine of FIG. 14, to classify the probability that the tokens of thecurrent token pair are of the same material.

In step 2068, the CPU 12 determine whether the probability issufficiently high to assume that the tokens of the current pair are ofthe same material. Again, the decision can be based upon a thresholdvalue for a probability score, for example, 0.8. If no, the CPU 12proceeds to step 2070 to undo the merge, restate the probability scorefor the token pair, and return the token pair to the queue, and step2056. If yes, the CPU 12 proceeds to step 2072 to retain the merge, andreturn to step 2056.

Thus, the routine of FIG. 16 is an iterative process to parse througheach pair of tokens to determine same material pairs. Pursuant to afurther feature of the present invention, a plurality of secondaryclassifiers is provided. For example, one classifier trained toearly-stage merges, that is used when the first few token pairs areevaluated. A mid-stage classifier, that is used after several mergeshave been retained. And a late-stage classifier, that is used duringevaluation of the last few token pairs.

In our example of a same illumination constraint, the service provider24 identifies Type C and Type B tokens as the operators required by theselected constraint. The Type C tokenization block 35 generated the TypeC tokens. The service provider 24 operates the operators block 28 toexecute the above described techniques, to generate the relevant Type Btokens for the image 32, as well as a token map. The constraint builder26 organizes the generated token operators according to the exemplarymatrix equation, [A][x]=[b], for input to the solver 30. In the sameillumination constraint, the constraining relationship of the relevantconstraint generator software module is that adjacent Type C tokens, asindicated by the token map information, are lit by the sameillumination, unless the adjacent Type C tokens are part of the sameType B token.

Each Type C token stored by the service provider 24 is identified by aregion ID, and includes a listing of each constituent pixel by row andcolumn number. Each pixel of a Type C token will be of approximately thesame color value, for example, in terms of RGB values, as all the otherconstituent pixels of the same Type C token, within the noise level ofthe equipment used to record the image. An average of the color valuesfor the constituent pixels of each particular Type C token can be usedto represent the color value for the respective Type C token. Each TypeB token is identified by constituent Type C tokens, and thus can beprocessed to identify all of its constituent pixels via the respectiveconstituent Type C tokens.

Pursuant to a feature of the present invention, a model for imageformation reflects the basic concept of an image as comprising twocomponents, material and illumination. This relationship can beexpressed as: I=ML, where I is the image color, as recorded and storedin the respective image file 18, M the material component of therecorded image color and L the illumination component of the recordedimage color. The I value for each Type C token is therefore the averagecolor value for the recorded color values of the constituent pixels ofthe token.

Thus: log(I)=log(ML)=log(M)+log(L). This can be restated as I=m+l, whereI represents log(I), m represents log(M) and l represents log(L). In theconstraining relationship of the same illumination constraint, in anexample where three Type C tokens, a, b and c, (see FIG. 17) areadjacent (and not within the same Type B token, (as can be shown by acomparison of row and column numbers for all constituent pixels)),l_(a)=l_(b)=l_(c). Since: l_(a)=l_(a)−m_(a), l_(b)=l_(b)−m_(b), andl_(c)=l_(c)−m_(c), these mathematical relationships can be expressed as(1)m_(a)+(−1)m_(b)+(0)m_(c)=(i_(a)−i_(b)),(1)m_(a)+(0)m_(b)+(−1)m_(c)=(i_(a)−i_(c)) and(0)m_(a)+(1)m_(b)+(−1)m_(c)=(i_(b)−i_(c)).

FIG. 17 shows a representation of an [A][x]=[b] matrix equation for themathematical relationships of the example of the three adjacent Type Ctokens a, b and c described above, as constrained by the sameillumination constraint: the adjacent Type C tokens a, b and c are atthe same illumination. In the matrix equation of FIG. 17, the variousvalues for the log(I), in the [b] matrix, are known from the averagerecorded pixel color values for the constituent pixels of the adjacentType C tokens a, b and c, generated by the Type C tokenization block 35from the image selected for segregation. The [A] matrix of 0's, 1's and−1's, is defined by the set of equations expressing the selected sameillumination constraint, as described above. The number of rows in the[A] matrix, from top to bottom, corresponds to the number of actualconstraints imposed on the tokens, in this case three, the sameillumination between three adjacent Type C tokens. The number of columnsin the [A] matrix, from left to right, corresponds to the number ofunknowns to be solved for, again, in this case, three. Therefore, thevalues for the material components of each Type C token a, b and c, inthe [x] matrix, can be solved for in the matrix equation. It should benoted that each value is actually a vector of three values correspondingto the RGB color bands of our example.

Accordingly, the matrix equation of FIG. 17, as arranged by theconstraint builder 26, is input by the constraint builder 26 to thesolver 30 for an optimized solution for the values of the materialcomponents of the adjacent Type C tokens a, b and c of the selectedimage. As noted above, in the exemplary GUI embodiment of the presentinvention, a user selects one of several mathematical techniques forfinding the optimal solution to the system of constraint equations,[A][x]=[b]. The CPU 12 configures the solver 30 according to themathematical operation selected by the user.

For example, in a standard least squares solver, the matrix equation isrestated as \underset{x} {min}(Ax−b)². The solver 30 then executes theleast squares operation to determine optimized values for each of m_(a),m_(b) and m_(c). The solver 30 can then proceed to generate and displaya material image based upon the optimal m_(a), m_(b) and m_(c) values.In the material image, the m_(a), m_(b) and m_(c) values are substitutedfor the originally recorded RGB values, for each pixel of the respectivetokens. The solver 30 can proceed to also generate an illumination imagefrom the known recorded image values i_(a), i_(b), i_(c), and thedetermined m_(a), m_(b) and m_(c) values, utilizing the model expressedby i=m+l. Each of the material and illumination images (the intrinsicimages) are displayed on the monitor 20, via, for example, the GUI (seeFIG. 5) and can be stored by the service provider 24, andcross-referenced to the original image file 18.

Referring once again to FIG. 3, step 1004 represents the storing of theintrinsic images in the memory 16 of the computer system 10. Pursuant toa further feature of the present invention, the CPU 12 operates (step1006) to utilize the intrinsic images in a further processing of theintrinsic images. For example, the CPU 12 can process one or more of theintrinsic images in the performance of tasks such as object recognitionor optical character recognition, color manipulation, editing, intensityadjustment, shadow removal or adjustment, adjustment of geometry, orother illumination and/or material manipulation, and so on, and/orcompilation of information related to the image, and or objects depictedin the image.

According to a further feature of the present invention, the solver 30can be configured to introduce factors including bounds that capture thelimits of real world illumination and material phenomena, to keepmaterial/illumination values determined by the optimization proceduresas solutions, [x], to within physically plausible ranges. This can beimplemented, for example, in an iterative technique to introduceadditional inequality constraints on out-of-bounds values in [x], ateach iteration, and executed to resolve towards values within thedefined bounds. Thus, the above described least squares technique can beaugmented to include minimum and maximum bounds on individual materialestimates (as expressed by the entries of [x]). Moreover, the entries of[x] can be regularized such that the material estimates are consistentwith a priori knowledge of material properties.

In an exemplary embodiment of the present invention, the matrices usedin the least squares solver to specify the selected constraints, [A] and[b] are subject to the following bounds, expressed by the problem:

a linear least squares formulation:min_(x′): Σ_(i)(A _(i) ^(T) x′−t _(i))²subject to:

-   -   x′≧α_(m)1    -   x′≦ω_(m)1    -   x′≧img_(j)        where 1 denotes the vector of all ones, α_(m), the darkest        possible material value (for example, a material cannot be        darker than coal), and ω_(m), the brightest possible material        value. The img_(j) value is the log intensity value at a        particular token j, to provide a constraint based upon the real        world observation that a segregated material color cannot be        darker than it appeared in the original image, since        illumination can only brighten the apparent color of an observed        material.

In the linear least squares formulation, the unique minimum solution forx′ is the material map that minimizes, in a linear system expressed byA^(T) Ax′=A^(T) t, the average squared difference between the targetmaterial differences t_(i) and the estimated differences A_(i) ^(T)x′.For example, if the “ith” constraint A_(i) dictates that two tokens a &b are the same material, A^(T) Ax′ takes the difference between thevalues of tokens a & b in x′ and computes the distortion from the targetvalue t_(i)=0.

The inequalities expressed by the “subject to” bounds set forth above,form a feasible set of material solutions x′ which satisfy the realworld constraints of possible maximum and minimum material color values.This differs from the standard, known least squares solution in that x′,if not further constraint by the “subject to” bounds, could take on avalue at a given location of an image (for example, at a particularpixel or token) that violates the real world observations ofreflectance, yet achieves a more optimal solution for the min x′formulation.

In the optimization process executed by the solver 30, whenever anytokens have material color values that violate the “subject to”inequalities, at a particular iteration of the process, additionaltemporary constraints are added that pin the material values inviolation, to values that satisfy the bounding conditions. Thus, theoriginal matrices [A] and [b] are augmented with new matrices specifyingthe new bounding constraints A_(bounds) and b_(bounds) (as an expressionof the “subject to” bounds) to define a new augmented system of matrixequations [A; A_(bounds)][x]=[b, b_(bounds)]. The augmented system ofequations can be solved analogously to the original system, for example,using the known least squares procedure.

In accordance with the above described bounded feature of the presentinvention, additional, temporary constraints are added whenever colorvalues violate real world phenomena. A re-solving of the augmentedequations can be repeated, as necessary, starting with the originalsystem A^(T) Ax′=A^(T) t, each time (i.e. the temporary boundingconstraints need not be carried over between iterations), anditeratively solving augmented systems A′^(T)A′x′=A′^(T)t′ until the“subject to” bounds are satisfied.

In accordance with yet another feature of the present invention, an L₁,L_(∞) objective function provides a regularization of the optimizedsolution by encoding a preference for a small number of materialchanges. In effect, the L₁, L_(∞) solver includes the a priori beliefthat material maps should contain a small number of materials in afigure-of-merit. In the solver of the system, there is a distinctionbetween the objective function, a formula that assigns a figure-of-meritto every possible solution, and the algorithm used to find a solution,an optimal value according to a given objective function. As the problemin our exemplary embodiment is stated as a minimization, min_(x′):Σ_(i)(A_(i) ^(T)x′−t_(i))², the value an objective function assigns canbe characterized as a “cost.”

In our problem, let x′ be a matrix of a number of rows of tokens and anumber of columns of color bands, where x′^(c) denotes the c^(th) columnassociated with the c^(th) color band. The least squares objectivefunction, in formula, is augmented, as follows:min_(x′): Σ_(c)Σ_(i)(A _(i) ^(T) x′ ^(c) −t ^(c) i)²+γΣ_(k|t) _(k)max_(c) |A _(k) ^(T) x′ ^(c)|where γ|γ>0 governs the trade-off between the cost associated with theleast squares term and the L₁, L_(∞) penalty. The expression Σ_(k|t)_(k) max_(c)|A_(k) ^(T)x′^(c)| accumulates the maximum per-channelabsolute difference over all the same material constraints in [A].

For example, given a same-material constraint between tokens a & b, theL₁, L_(∞) function will only include a term for a color channel with thelargest difference in between x^(c) _(a) and x^(c) _(b) over colorchannel c. In an exemplary embodiment of the present invention, theoptimization procedure, for example as expressed by the objectivefunction min_(x′): Σ_(c)Σ_(i)(A_(i) ^(T)x′^(c)−t^(c)i)²+γΣ_(k|t) _(k)max_(c)|A_(k) ^(T)x′^(c)|, is a shrinkage technique. That is, a sequenceof least squares problems is solved in a manner wherein, at each round,constraint targets determined to violate the same-material constraintare shrunk. At the end of the sequence, constraints with a value below agiven threshold are culled from the constraint system, and a new leastsquares solution is computed. It should be noted that bounds such as the“subject to” bounds discussed above, can be added to the objectivefunction to provide a bounded L₁, L_(∞) solver.

FIG. 18 is a generalized functional block diagram for the serviceprovider 24 and constraint builder 26. To summarize the above describedconstraint examples in a general scheme, a selection is made of an image32, and a number of constraint generators from a set of constraintgenerators 1, 2, . . . N, (the constraint generator software modules)for example, by a user, via the GUI. The set of constraint generators1-N includes the constraints described above, and any additionalconstraining relationships developed as a function of spatio-spectralinformation for an image. A constraint can also include same-materialconstraints based upon Type C tokens found to have a high likelihood ofbeing of the same material through execution of the computer learningtechnique described above, or any tokens within a cluster, as describedabove. The above described set of constraints is provided as an example.The present invention contemplates any constraining relationship basedupon spatio-spectral operators, that provides a logical deductionregarding material and illumination aspects of an image, and thus abasis for constructing matrices [A] and [b] to define a set of equationswhose optimal solution captures intrinsic illumination and materialcomponents of a given image.

Likewise, a set of operators 1-M, generated by the Type C tokenizationblock 35 or the operators block 28, includes all operators defined inthe constraint generator modules 1-N. As shown in FIG. 18, the serviceprovider 24 provides all of the operators 1-M, as required by theselected constraint generators 1-N and further couples the selectedconstraint generators 1-N to a constraint assembly 39 via a logicalswitch 40 (both configured within the constraint builder 26). In theevent any of the operators 1-M for a selected image 32 are not alreadystored by the service provider 24, the service provider 24 utilizes theoperators block 28 to compute such operators on demand, in the mannerdescribed above. The constraint assembly 39 constructs a separate[A][x]=[b] matrix for each one of the selected constraint generators, asa function of the operators and the constraining relationships definedin the respective constraint generators 1-N. In each case, the[A][x]=[b] matrix is constructed in a similar manner as described abovefor the same illumination example.

Upon completion of the construction of the system of equations[A]_(i)[x]=[b]_(i), for each of the selected constraint generators,i={1, 2, . . . N}, the constraint assembly 39 concatenates theconstituent matrices [A]_(i), [b]_(i), from each constraint generator.Since each of the concatenated equations may contain a different subsetof the unknowns, [x], the assembly is performed such that correspondingcolumns of individual matrices [A]_(i), that constrain particularunknowns in [x], are aligned. The concatenated matrices, [A] [x]=[b],are then input to the solver 30, for solution of the unknowns in thecomplete [x] vector, pursuant to the selected optimization procedure,for output of intrinsic images 34. The individual constraints within theconcatenated matrices, [A][x]=[b], can be weighted relative to oneanother as a function of factors such as perceived importance of therespective constraint, strength or empirically determined confidencelevel.

The above described example of a same illumination constraint utilizesType C token and Type B token spatio-spectral operators. These tokenoperators provide an excellent representation of images that includelarge surface areas of a single material, such as are often depicted inimages including man-made objects. However, in many natural scenes thereare often large areas of highly textured regions, such as sand, grass,stones, foliage, and so on. As noted above, identification of Type Btokens using Type C tokens, can be difficult in an image texture.According to a further feature of the present invention, a textonhistogram operator provides a mechanism for capturing statisticallyuniform spatial variations of textured regions in a manner that isuseful in a constraint based optimization, for example, as expressed bythe [A][x]=[b] matrix equation.

Thus, according to this feature of the present invention, rather thangenerating Type C tokens in textured regions of an image, from intensityhistograms, for use in identifying Type B tokens, as described above,texture tokens are generated as a species of Type B tokens, for use in aconstraint. In an exemplary embodiment of the texton histogram operator,the operators block 28 converts each pixel of the image (or pixels ofthose regions of an image identified as comprising a texture) from therecorded color band representation of the respective image file 18, suchas, for example, RGB color band values, to a two band representationwherein the two bands comprise a texton label and a texton histogramlabel. The two band representations for the pixels are then used toidentify texture tokens, as will appear.

A texton label for each pixel is generated through execution of aclustering process. A texture can be characterized by a textureprimitive (for example, in a grass texture, a single blade of grass),and the spatial distribution of the primitive. A texton analysis is ananalytical method for characterizing a texture primitive, for examplevia a clustering algorithm. Clustering is a process for locating centersof natural groups or clusters in data. In an exemplary embodiment of thepresent invention, the data comprises pixel patches selected from amongthe pixels of an image being segregated into material and illuminationcomponents. For example, 3×3 pixel patches are clustered into Kdifferent groups, with each group being assigned a designating number(1, 2, 3, . . . K). The texton label for each pixel of the 3×3 array isthe group number of the group to which the respective patch was assignedduring the clustering process.

To expedite execution of a clustering algorithm, random samples of 3×3patches can be selected throughout the image, or region of the imageidentified as comprising a texture, for processing in a clusteringalgorithm. After execution of the clustering algorithm by the CPU 12(operating as the operators block 28), each 3×3 patch of the image isassigned the texton label of the closest one of the K group centersidentified in the clustering process, as executed in respect of theselected random samples.

To advantage, prior to execution of a clustering algorithm, the pixelsof the image are subject to an image intensity normalization. In aclustering process utilizing an intensity-based distance matrix, darkareas of an image may be placed in a single group, resulting in an underrepresentation of groups for shadowed areas of a textured region of animage. A normalization of the image provides a more accurate textonrepresentation for texture regions under varying illumination. Anormalized intensity for a pixel can be expressed by:i _(norm)(n,m)=log(i(n,m)/i _(b)(n,m)),where i_(norm)(n,m) is the normalized intensity for a pixel p(n,m),i(n,m) is the intensity for the pixel p(n,m), as recorded in the imagefile 18, and i_(b)(n,m) is a blurred or low passed filtered version ofthe pixel p(n,m). For example, a 10 pixel blur radius can be used in anystandard blurring function.

Clustering can be executed according to any known clustering algorithm,such as, for example, K means clustering where there are K clusters orgroups S_(i), i=1, 2, . . . K, and μ_(i) is the mean point or centerpoint of all the data points x_(j)εS_(i). In our example, each x_(j)comprises a selected 3×3 pixel patch arranged as a 9×1 vector of thenine pixels in the patch (27 elements total, including the RGB values ofeach of the nine pixels of the vector). As noted above, each mean pointμ_(i) is assigned a texton label, 1, 2, 3 . . . K, that becomes thetexton label for any pixel of a 3×3 patch clustered into the group forwhich the respective mean point is the center.

According to an exemplary embodiment of the present invention, the CPU12 executes the algorithm by initially partitioning the selected 9×1vectors, representing 3×3 pixel patches of the image, into K initialgroups S_(i). The CPU 12 then calculates a center point μ_(i), for eachgroup S_(i), utilizing an intensity-based distance matrix. Afterdetermining a center point μ_(i), for each group S_(i), the CPU 12associates each 9×1 vector to the closest center point μ_(i), changinggroups if necessary. Then the CPU 12 recalculates the center pointsμ_(i). The CPU 12 executes iterations of the steps of associating each9×1 vector to the closest center point μ_(i), and recalculating thecenter points μ_(i), until convergence. Convergence is when there is noneed to change the group for any of the 9×1 vectors. At that point, theCPU 12 assigns the group number for the respective center point μ_(i),as the texton label for the pixels of each vector in that group.

As noted above, pixels of 3×3 patches not selected as samples forclustering are assigned the texton label of the closest one of the Kgroup centers μ_(i), identified in the clustering process, as executedin respect of the selected random samples. A texton label map is storedby the service provider 24, and is coextensive with the pixel array ofFIG. 2. In the texton label map, for each pixel location, there is anindication of the respective texton label.

Upon completion of the texton label assignment for pixels of the image,the CPU 12 operates to generate a texton histogram for each pixel toprovide a representation of the spatial variation of textonrepresentations within a textured region of the image. To that end, theCPU 12 accesses the texton label map. At each pixel location within thetexton label map, a pixel patch of, for example, 21×21 pixels, is set uparound the current location. The 21×21 patch size is far greater thanthe 3×3 patch sized used to generate the texton representations, so asto capture the spatial variations of the texture. A texton histogram isthen generated for the pixel location at the center of the 21×21 patch,in a similar manner as the intensity histogram described above. However,rather than bins based upon color band values, in the texton histogram,there is a bin for each texton label value, 1, 2, 3 . . . K. The countfor each bin corresponds to the number of pixels in the 21×21 patchhaving the texton label value for the respective bin.

When a texton histogram is generated for each pixel of the texton labelmap, the CPU 12 executes a second clustering step. In the secondclustering step, the texton histograms are clustered using spectralclustering. Spectral clustering techniques use a spectrum of asimilarity matrix of data of interest, (in our example, the textonhistograms) to reduce the dimensionality for clustering in fewerdimensions. A similarity matrix for a given set of data points A can bedefined as a matrix S where S_(ij) represents a measure of thesimilarity between points i, jεA. In our example, eigenvectors of theLaplacian are clustered using a mean shift. The distance metric is achi-squared distance of the histograms.

A texton histogram label (1, 2 . . . ) is assigned to each cluster groupdefined by the clustering procedure. For each pixel of the texton labelmap, the texton histogram label for the cluster group corresponding to atexton histogram that is nearest the texton histogram for the respectivepixel, is assigned to that pixel. Distance is defined as the chi-squaredhistogram distance. Upon completion of the assignment of a textonhistogram label to each pixel, each pixel is now represented by a twoband, texton label, texton histogram label representation.

According to a feature of the present invention, the two band, textonlabel, texton histogram label representations for pixels of an imagefile 18 can be utilized in a constraint for construction of an[A]_(i)[x]=[b]_(i) constituent within the concatenated matrices,[A][x]=[b]. For example, it can be assumed that a region of an imagewherein contiguous pixels within the region all have the same two band,texton label, texton histogram label representation, comprises a regionof the same mean material of a texture depicted in the image. Such aregion can be referred to as a texture token, a species of a Type Btoken. Thus, a constraint can be imposed that all Type C tokens withinthe same texture token are of the same mean material. In thisconstraint, the Type C tokens are the Type C tokens generated from thecolor band values of the constituent pixels by the Type C tokenizationblock 35.

While the above exemplary embodiment of the present invention has beendescribed with a user selecting constraint generators and mathematicaloperations via a GUI, the image segregation processing can be done inother operating modes, such as automatically, with images, constraintgenerators and mathematical operations being automatically selected, forexample, as a function of image parameters.

In the preceding specification, the invention has been described withreference to specific exemplary embodiments and examples thereof. Itwill, however, be evident that various modifications and changes may bemade thereto without departing from the broader spirit and scope of theinvention as set forth in the claims that follow. The specification anddrawings are accordingly to be regarded in an illustrative manner ratherthan a restrictive sense.

What is claimed is:
 1. An automated, computerized method for processingan image, comprising the steps of: providing an image file depicting animage, in a computer memory; identifying a set of indicia measuring oneof similarities and dissimilarities for selected pairs of regions of theimage; transforming similarity and dissimilarity information derivedfrom the set to pairwise distances; performing a clustering operation asa function of the pairwise distances to identify clusters; defining sameintrinsic characteristic constraints as a function of the clusters; andperforming an optimization operation as a function of the constraints togenerate intrinsic images corresponding to the image, the intrinsicimages each comprising a representation of one of material orillumination, expressed as a separate, multi-band representation for theone of material or illumination, independent of the other of thematerial or illumination, wherein each band corresponds to a segment ofthe electro-magnetic spectrum, such that when the intrinsic imagecomprises an illumination image, the illumination image captures theintensity and color of light incident upon each point on the surfacesdepicted in the image, and when the intrinsic image comprises a materialimage, the material image captures reflectance properties of surfacesdepicted in the image, including the percentage of each wavelength oflight a surface reflects, to segregate the image into intrinsic materialreflectance and illumination components.
 2. The method of claim 1wherein the same intrinsic characteristic constraints comprisesame-material constraints.
 3. The method of claim 1 wherein the set ofindicia measuring one of similarities and dissimilarities for selectedpairs of regions of the image comprises a set of classification scoresgenerated by a computer learning technique.
 4. The method of claim 1wherein the step of performing a clustering operation as a function ofthe pairwise distances is carried out by performing a sparse clusteringoperation.
 5. A computer program product, disposed on a non-transitorycomputer readable media, the product including computer executableprocess steps operable to control a computer to: provide an image filedepicting an image, in a computer memory, identify a set of indiciameasuring one of similarities and dissimilarities for selected pairs ofregions of the image, transform similarity and dissimilarity informationderived from the set to pairwise distances, perform a clusteringoperation as a function of the pairwise distances to identify clusters,define same intrinsic characteristic constraints as a function of theclusters, and perform an optimization operation as a function of theconstraints to generate intrinsic images corresponding to the image, theintrinsic images each comprising a representation of one of material orillumination, expressed as a separate, multi-band representation for theone of material or illumination, independent of the other of thematerial or illumination, wherein each band corresponds to a segment ofthe electro-magnetic spectrum, such that when the intrinsic imagecomprises an illumination image, the illumination image captures theintensity and color of light incident upon each point on the surfacesdepicted in the image, and when the intrinsic image comprises a materialimage, the material image captures reflectance properties of surfacesdepicted in the image, including the percentage of each wavelength oflight a surface reflects, to segregate the image into intrinsic materialreflectance and illumination components.
 6. The computer program productof claim 5 wherein the same intrinsic characteristic constraintscomprise same-material constraints.
 7. The computer program product ofclaim 5 wherein the set of indicia measuring one of similarities anddissimilarities for selected pairs of regions of the image comprises aset of classification scores generated by a computer learning technique.8. The computer program product of claim 5 wherein the process step tocontrol the computer to perform a clustering operation as a function ofthe pairwise distances is carried out by providing a process step tocontrol the computer to perform a sparse clustering operation.
 9. Anautomated, computerized method for processing an image, comprising thesteps of: providing an image file depicting an image, in a computermemory; identifying a set of indicia measuring one of similarities anddissimilarities for each of a series of selected pairs of regions of theimage; selecting a pair of regions of the image from the series;performing a segregation of the image into intrinsic images, assuming asame intrinsic characteristic for the selected pair of regions, the stepof performing being carried out by performing an optimization operationas a function of a constraint based upon the same intrinsiccharacteristic for the selected pair of regions, to generate intrinsicimages corresponding to the image, the intrinsic images each comprisinga representation of one of material or illumination, expressed as aseparate, multi-band representation for the one of material orillumination, independent of the other of the material or illumination,wherein each band corresponds to a segment of the electro-magneticspectrum, such that when the intrinsic image comprises an illuminationimage, the illumination image captures the intensity and color of lightincident upon each point on the surfaces depicted in the image, and whenthe intrinsic image comprises a material image, the material imagecaptures reflectance properties of surfaces depicted in the image,including the percentage of each wavelength of light a surface reflects,to segregate the image into intrinsic material reflectance andillumination components; evaluating the intrinsic images according topreselected criteria; retaining the pair as being of the same intrinsiccharacteristic when the evaluation satisfies the criteria; and repeatingthe selecting, performing, evaluating and retaining steps for each pairof the series.
 10. The method of claim 9 wherein the selected pair ofregions is selected on the basis of being most probably of a sameintrinsic characteristic of the image.
 11. A computer program product,disposed on a non-transitory computer readable media, the productincluding computer executable process steps operable to control acomputer to: provide an image file depicting an image, in a computermemory, identify a set of indicia measuring one of similarities anddissimilarities for each of a series of selected pairs of regions of theimage, select a pair of regions of the image from the series, perform asegregation of the image into intrinsic images, assuming a sameintrinsic characteristic for the selected pair of regions the processstep to perform being carried out by performing an optimizationoperation as a function of a constraint based upon the same intrinsiccharacteristic for the selected pair of regions, to generate intrinsicimages corresponding to the image, the intrinsic images each comprisinga representation of one of material or illumination, expressed as aseparate, multi-band representation for the one of material orillumination, independent of the other of the material or illumination,wherein each band corresponds to a segment of the electro-magneticspectrum, such that when the intrinsic image comprises an illuminationimage, the illumination image captures the intensity and color of lightincident upon each point on the surfaces depicted in the image, and whenthe intrinsic image comprises a material image, the material imagecaptures reflectance properties of surfaces depicted in the image,including the percentage of each wavelength of light a surface reflects,to segregate the image into intrinsic material reflectance andillumination components, evaluate the intrinsic images according topreselected criteria, retain the pair as being of the same intrinsiccharacteristic when the evaluation satisfies the criteria and repeat theselecting, performing, evaluating and retaining steps for each pair ofthe series.
 12. The computer program product of claim 11 wherein theselected pair of regions is selected on the basis of being most probablyof a same intrinsic characteristic of the image.